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Inuroduction 

^This  technical  report  covers  research  carried  out  by  the 
Secure  Distributed  Processing  Systems  group  at  UCLA, under  ARPA 
Contract  MDA-903-77-C-O2II  during  the  •'Three  ’quarters  in  the 
period  July  1,  1977  to  March  31,  1978.  ^Significant  advances  have 
been  made  on  all  four  contracted  tasks,  namely  network  security, 
data  management  security,  high  availability  secure  information 
management,  and  UCLA  secure  system  enhancements.  Below,  we 
describe  that  progress  and  point  to  the  list  of  references  which 
represent  the  published  work  resulting  from  this  supported 
research .  '>  /  ^  , 


Task  i  -  Network  Security 

A  number  of  significant  steps  have  been  taken  over  the  last 
three  quarters.  First,  UCLA  is  participating  in  the  larger  ARPA 
sponsored  network  security  experiment  employing  BCR  units  to 
demonstrate  that  end  to  end  encryption  of  individual  connections 
on  the  ARPANET  is  feasible.  A  BCR  unit  has  been  received  at  UCLA 
and  checkout  has  begun.  Coordination  of  UCLA’s  initial  role  as  a 
server  host  in  the  BCR  experiment,  and  subsequently  potentially 
as  a  key  distribution  center,  has  been  coordinated  v;ith  other 
ARPA  contractors. 

A  major  portion  of  the  effort  in  network  security  has  been 
devoted  to  the  integration  of  encryption  techniques  into  the  pro¬ 
tocols  of  networks  and  the  architecture  of  the  operating  systems 
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making  network  control  software,  as  well  as  all  other  system 
software,  irrelevant  to  system  security,  so  long  as  an  appropri¬ 
ate  operating  system  kernel  vas  installed  in  participating  hosts. 
T  0  I  s  (i  e  V  e  1  o  p  :n  0  n  t  d  r  a  ;!i  a  t  i,  c  a  1 1  y  si  m  p  L  i  Ties  t  h  s  trust  u  re  of  n 
net'uork  aecurity  mechanisms,  obviating  the  need  for  BCR  units  at 
secure  hosts,  as  well  as  any  requirement  for  trusted  network 
management  software.  A  prototype  of  this  integrated  end  to  end 
network  security  architecture  has  been  developed  for  the  UCLA 
Secure  Operating  System  Prototype.  The  implementation  is  now 
being  improved  to  integrate  it  into  the  complete  system.  The  fi¬ 
nal  prototype  is  scheduled  to  be  used  in  the  Navy's  ACCAT  Guard 
-project  later  this  year. 

The  importance  of  this  work  is  severalfold.  First,  it 
demonstrates  that  end  to  end  security  to  the  process  level  is 
very  cheap  to  implement  and  operate,  given  the  existence  of 
secure  operating  systems.  Second,  the  approach  is  directly  ap- 
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plicable  to  existing  networks  such  as  the  ARPANET. 

Several  architectural  design  and  analysis  efforts  have  also 
been  in  progress  during  this  period,  reported  in  references  6  and 
7.  The  first  presents  a  general  view  of  design  issues  in  network 
security,  and  has  been  used  by  the  Mitre  Cprp.  in  the  development 
of  their  network  security  methods  for  military  systems.  The  oth¬ 
er  challenges  much  of  the  work  on  public  key  encryption  methods, 
and  shows  that  all  digital  signature  methods  previously  proposed 
suffer  from  serious  flaws.  A  superior  method  is  then  outlined. 


-Task  XX  -  Data  Management  Security 

Data  management  systems  typically  employ  considerably  more 
software  mechanism  in  the  representation  and  management  of  the 
data  they  contain  than  do  operating  systems.  As  a  result,  the 
task  of  developing  reliable  enforcement  controls  potentially  is 
significantly  more  difficult.  Many  have  thought  as  a  result  that 
a  kernel  architecture  approach  to  data  management  security  was 
not  feasible.  If  so,  that  would  be  quite  unfortunate,  since  ker¬ 
nels  severely  limit  the  amount  of  software  which  must  operate 
securely.  At  UCLA,  we  have  succeeded  in  developing  a  general 
kernel  based  architecture,  meaning  that  it  is  potentially  feasi¬ 
ble  to  provide  highly  reliable  security  enforcement  in  data 
management  systems  through  the  correct  installation  of  a  very 
small  amount  of  software.  This  result  is  very  important,  since 
without  it,  much  of  the  code  in  a  data  management  system  would 
have  to  operate  securely,  and  the  cost  of  providing  secure  data 
management  would  then  often  be  prohibitive.  The  design  was  pub¬ 
lished  late  in  1977  as  reference  2,  and  in  order  to  demonstrate 
its  operational  feasibility,  the  INGRES  data  management  system  is 
currently  being  altered  to  include  the  proposed  kernel  struc¬ 
tures.  An  important  result  of  this  test  implementation,  besides 
demonstrating  feasibility,  is  that  our  approach  is  retrofittable 
to  existing  systems.  The  savings  in  existing  software  can  be 
a  :i  0  r  o  ;j  :-i  . 


Task  III  -  High  Availability  Secure  In  format  ion  Management 
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buted  systems.  The  core  of  this  effort  is  the  development  of  a 
highly  available,  secure  distributed  system  base  that  can  run  in 
an  integrated  fashion  on  local  networks,  utilize  existing  equip¬ 
ment,  and  provide  a  base  on  top  of  which  one  can  easily  install 
such  applications  as  distributed  data  management  systems,  elec¬ 
tronic  office  facilities,  and  the  like.  The  base  is  to  be  en¬ 
tirely  responsible  for  backup,  recovery,  security,  and  much  of 
system  management.  It  should  be  easily  extensible  in  terms  of 
hardware  additions  and  deletions,  all  without  software  altera¬ 
tions  or  user  knowledge. 


The  design  of  this  system  base  has  progressed  a  great  deal 


over  the  past  three  quarters,  and  a  preliminary  design  document 
was  reviewed  by  other  internationally  known  researchers.  A  more 
complete  design  report  will  be  completed  this  summer,  at  which 
time  prototype  development  will  commence. 

In  conjunction  with  this  research  direction,  a  number  of 
other  strong  efforts  have  been  completed.  First  is  a  complete 
protocol  for  coordination  of  resources  in  a  distributed  system, 
that  permits  arbitrary  failures  of  nodes,  links,  and  software 
modules,  either  during  normal  operation  or  recovery.  Synchroni¬ 
zation  suitable  for  sophisticated  data  management  is  provided, 
and  the  correctness  of  the  entire  protocol  is  proven.  The  work 
is  reported  in  reference  5,  and  has  been  accepted  for  publication 
in  the  top  journal  of  the  field. 

Several  other  protocols  have  also  been  developed  for  coordi¬ 
nation  of  resources  and  detection  of  deadlock.  These  are  now  un¬ 
dergoing  refinement  and  have  been  submitted  for  publication. 

To  support  the  distributed  system  development,  UCLA  has  par¬ 
ticipated  in  the  development  of  the  Local  Network  Interfaces 
(LNIs)  principally  developed  by  UCI  and  MIT,  and  three  Interfaces 
are  scheduled  to  be  installed  at  UCLA  this  summer,  creating  a 
three  node  network  for  use  in  development  and  measurement  experi- 
ments . 


Task  I V  -  UCLA  Secure  System  Enhancement 


The  UCLA  Secure  system  prototype  has  been  the  focus  of  con¬ 
siderable  progress.  First,  a  great  deal  of  energy  has  been  spent 
in  completing  the  prototype.  The  basic  task  is  nov;  essentially 
done.  Full  function  operation  has  been  demonstrated.  Fine 
grained  protection  on  an  individual  file  basis  is  implemented, 
and  virtually  all  standard  Unix  programs  operate  without  change 
or  recompilation.  This  system  is  the  first,  and  currently  only. 
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the  next  months,  the  remaining  software  development  tasks  asso¬ 
ciated  with  the  prototype  will  be  completed  and  documentation 
will  be  developed.  A  general  architecture  paper  is  already 
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The  UCLA  kernel  has  also  served  as  the  focus  for  consider¬ 
able  program  verification  activity.  Complete  concrete  specifica¬ 
tions  for  the  entire  kernel,  suitable  as  input  to  an  interactive 
verification  system,  were  completed  late  last  year.  Complete 
abstract  specifications  were  completed  early  in  1978,  and  the 
abstract  to  concrete  correspondence  is  now  well  underway.  The 
programs  being  verified  compose  the  largest  practical  verifica¬ 
tion  effort  in  the  country,  and  it  is  uncovering  a  great  deal 
about  hovf  verification  of  large  systems  must  be  done,  as  well  as 
the  nature  of  the  tools  which  are  required.  As  this  research  has 
progressed,  a  cooperative  arrangement  with  USC/Information  Sci¬ 
ences  Institute  has  developed,  since  their  XIVUS  interactive  ve- 
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Encryption  Protocols, 


Public  Key  Algorithms  and  Digital  Signatures 
in  Computer  Networks®' 


by 

Gerald  J.  Popek  and  Charles  S.  Kline 
University  of  California  at  Los  Angeles 


Abstract 

The  general  problem  of  secure  communication  in  computer 
networks  is  considered,  especially  issues  related 

of  encryption  protocols,  the  relationship  between  public  key  and 
conventional  encryption  algorithms,  and  digital  signatures.  The 
conclusions  reached  in  these  areas  are  as  follows. 


A) 


B) 


A  crucial  problem  in  integrating  encryption  into  networks  is 
minimization  of  the  mechanism  which  must  be  trusted  .  _  A 
general  protocol  is  presented  which  appears  to  accomplis 
this  goal  and  is  suitable  for  either  public  key  based  or 
conventional  encryption  algorithms. 

Public  key  and  conventional  encryption  algorithms  are 
functionally  equivalent,  in  the  sense  that 
any  advantages  over  the  other,  either  in  the  way 
used,  the  functions  they  provide,  or  in  the  amount  of 
mechanism  that  must  be  trusted  in  their  support. 

Both  public  key  and  conventional  encryption  approaches  to 
digital  signatures  depend  critically  o 

auLentieation  for  their  suitability,  in  ways 
recognized.  They  appear  equivalent  with  respect  to  safety. 

D)  Neither  the  signature  method  outlined  by  Rabin  nor  the 

public  key  based  protocols  appears  satisfactory.  A 

riptanl  ’  network  digit-a  signature  n-thod  ^3  d-nnr-o-ci. 


C) 


1  .  Introduction 
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. . n*'  e.icrvotion  methods  Tor  computer  networks. 

Activity'^falls  into  two  major  but  related  areas:  the  development 
of  strLg  encryption  algorithms,  and  the  design  of  the  rules  or 
^^otocols  by  "hi"  bh  algorith.  is  actually  used  In  an  operating 
network.  As  an  example  of  the  relation  between  these  two  areas, 
public  key  algorithms  have  been  suggested  as  a _ superior  solution 
to  key  distribution  and  digital  signatures;  issues  which, 
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claimed,  vfould  otherwise  require  additional  protocols.  Here  we 
concentrate  on  the  protocol  problems.  We  examine  protocol 
questions  which  arise  at  various  levels  of  a  system,  from  the 
low,  detailed  level  at  which  the  various  operating  systems  in  a 
netv;ork  communicate,  to  the  higher,  user  visible  level  involving 
such  services  as  digital  mail.  As  a  result  a  rather  unique 
perspective  is  provided,  and  we  are  led  to  some  fairly  surprising 
conclusions . 

The  paper  is  written  basically  in  a  bottom  up  fashion.  The 
first  section  considers  questions  of  how  encryption  "channels" 
interact  with  network  software.  The  next  section  outlines  a 
basic  protocol  for  the  use  of  encryption  in  a  network, 
independent  of  the  nature  of  the  encryption  algorithm  (public 
key,  conventional,  etc).  These  two  sections  show  how  it  is 
possible  to  build  a  secure  network  base,  on  top  of  which  many 
extensions  are  directly  possible.  At  that  point  attention  turns 
to  some  of  the  higher  level,  user  visible  issues,  such  as  public 
key  algorithms  and  digital  signatures.  It  is  argued  that  none  of 
the  currently  proposed  signature  methods  is  satisfactory.  We 
propose  an  alternative  which  we  believe  satisfies  the  necessary 
requirements.  It  is  based  on  the  existence  of  the  secure  lower 
level  protocols  discussed  in  the  earlier  sections.  Those  readers 
willing  to  accept  the  existence  of  secure  lower  level  network 
protocols  may  wish  to  skip  to  section  six,  where  the  discussion 
of  public  keys  and  digital  signatures  can  be  found. 


2 .  Levels  of  Integration 


Encryption  forms  the  basis  for  solutions  to  computer  network 
security  problems.  Basically,  a  single  communications  channel 
can  be  multiplexed  into  a  large  number  of  separately  protected, 
secure  communication  channels  by  assigning  a  separate 
key  pair  for  each  logical  communication  channel 
requests  the  establishment 
policy  checks  can 
ci  j  s  t  r  i  b  u  t  d  t  o  e  -i 


encrypt  ion 
When  a  user 
new  communication,  protection 
be  performed,  and,  if  successful,  a  key  can  be 

h  end  of  the  Lcit  ion  channel. 
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Several  key  distribution  methods  have  been  studied. [Pope k 
78b]  One  method  utilizes  a  key  distribution  center  which 

for  communications,  and  di.stributes  keys 
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.sec  nor,  keys  which  change  only  caueiy.  Co 
distributed  key  management,  with  several, 
participating  in  key  distribution.  Recently, 

encryption  algorithms  [Rivest  77a]  have 
Originally,  such  algorithms  were  thought  to  simplify  the  key 
distribution  problem,  but  recent  research  suggests  that  no 
savings  result .[ Needham  77]  This  issue  is  discussed  at  length 
section  six . 
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One  problem  which  must  be  resolved  in  designing  a  secure 
network  encryption  mechanism,  regardless  of  the  nature  of  the 
encryption  algorithm  or  the  key  distribution  method,  is  the  level 
of  integration  of  the  encryption  facility.  There  arc  many 
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possible 
computer 
switched 
sv;  itches 
choice , 
endpoints 


choices  for  the  endpoints  of  the  encryption  channel  in  a 
network,  each  with  its  own  tradeoffs.  In  a  packet 
network,  one  could  encrypt  each  line  betvjcen  two 
separately  from  all  other  lines.  This  is  a  low  cvel 
and  is  often  called  link  encryption.  Instead  the 

of  the  encryption  channels  could  be  chosen  at  a  higher 


architectural  level:  at  the  computer  systems,  referred  to  as 
hosts,  which  arc  connected  to  the  network.  Thus  the  encryption 
system  would  support  host-host  channels,  and  a  message  would  be 
encrypted  only  once  as  it  was  sent  through  the  network  rather 
than  being  decrypted  and  recncryptcd  a  number  of  times,  as 
implied  by  the  low  level  choice.  In  fact,  one  could  even  choose 
a  higher  architectural  level.  Endpoints  could  be  individual 
processes  within  the  operating  systems  of  the  machines  that  are 
attached  to  the  network.  If  the  user  were  employing  an 
intelligent  terminal,  then  the  terminal  is  a  candidate  for  an 
endpoint,  too.  This  view  envisions  a  single  encryption  channel 
from  the  user  directly  to  the  program  with  v;hich  he  is 
interacting,  even  though  that  program  might  be  running  on  a  site 
other  than  the  one  to  which  the  terminal  is  connected.  This  high 
level  choice  of  endpoints  is  sometimes  called  end-end  encryption. 


The  choice  of  architectural  level  in  which  the  encryption  is 
to  be  integrated  has  many  ramifications  for  the  overall 
architecture.  One  of  the  more  important  is  the  combinatorics  of 
key  control  versus  the  amount  of  trusted  software. 


In  general,  as  one  considers  higher  and  higher  levels  in 
most  systems,  the  number  of  identifiable  and  separately  protected 
entities  in  the  system  tends  to  increase,  sometimes  dramatically. 
For  example,  while  there  are  less  than  a  hundr’ed  hosts  attached 
to  the  ARPANET,  at  a  higher  level  there  often  are  over  a 
thousand  processes  concurrently  operating,  each  one  separately 
protected  and  controlled.  The  number  of  terminals  and  users  is 
of  course  also  high.  This  numerical  increase  means  that  the 
number  of  secure  channels  -  that  is  the  number  of  separately 
distributed  matched  key  pairs  required  -  is  correspondingly 
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In  return  for  the  additional  cost  and  complexity 
result,  there  can  be  significant  reduction  in  the 
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important  and  must  be  carefully  considered.  It  arises  in  the 
following  way.  When  the  lowest  level  is  chosen,  the  data  be'ing 
communicated  exists  in  cleartext  form  as  it  is  passed  from  one 
encrypted  link  to  the  next  by  the  switch.  Therefore  the  software 
in  the  switch  must  be  trusted  not  to  intermix  packets  of 
different  channels.  If  a  higher  level  is  selected,  from  host  to 
host  for  example,  then  errors  in  the  switches  are  of  no 
consequence.  However,  operating  system  failures  are  still 
serious,  since  the  data  exists  as  cleartext  while  it  is  system 
resident . 


In  principle  then,  the  highest  level  integration  of 
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encryption  is  most  secure.  However,  it  is  still  the  case  that 
the  data  must  be  maintained  in  clear  form  in  the  machine  upon 
which  processing  is  done.  Ther.^foro  the  more  classical  methods 
of  protection  within  individual  machines  are  still  quite 
necessary,  and  the  value  of  very  high  level  end-end  encryption 
may  be  somev/hat  lessened.  A  rather  appealing  choice  o  eve 
that  integrates  effectively  with  kernel  structured  operating 
system  architectures  is  outlined  in  section  four. 


Another  small  but  nontrivial  drawback  to  high 
encryption  should  be  pointed  out.  Once  the  data  is  encryp  c  ,  i 
is  difficult  to  perform  meaningful  operations  on  it.  Many 
front  end  systems  provide  such  functions  as  packing,  character 
erasures,  transmission  on  end  of  line  or  control  c  arac  er 
detect,  etc.  If  the  data  is  encrypted  before  it  reaches  the 
front  end,  then  these  functions  cannot  be  performed.  That  is, 
any  processing  of  data  flowing  through  the  channel  must  be  done 
above  the  level  at  which  encryption  takes  place. 


Protocols 


Network  communication  protocols  concern  the  discipline 
imposed  on  messages  sent  throughout  the  network  to  control 
virtually  all  aspects  of  data  traffic,  both  in  araoun  an 
direction.  It  is  well  recognized  that  choice  of  protocol  has 
dramatic  impacts  on  the  utility,  flexibility  and  bandwidth 
provided  by  the  network.  Since  encryption  facilities  essentially 
provide  a  potentially  large  set  of  logical  channels,  the 
protocols  by  which  the  operation  of  those  channels  is  managed 
also  can  have  significant  impact. 

There  are  several  important  questions  which  any  encryption 
protocolraustansv/er: 

1.  How  is  the  initial  clear text/c iphertex t/ clear tex t  channel 
from  sender  to  receiver  and  back  established? 

2.  How  are  cleartext  addresses  passed  by  the  sender  around  the 

to  tha  nftwo-k  o’.thout  providing  p  pa:..^  v 
cleartext  ue.a  can  be  i  n  ad  v  e  r  t  a  n  c  i  y  or  ■  n  tent  icaaily  .leaked 


by  the  same  means?  ^  , 

3.  What  facilities  are  provided  for  error  recovery  and 

r e sync h r o n i za t i o n  of  the  protocol? 

.  Ho ’-I  is  f  I  p  ■■  r  fo  rn  0  .^  ? 

T  How  arfj  ch-a nriceis  c.losed? 

e!  How  do  the  encryption  protocols  interact  with  the  rest  of  the 
network  protocols? 

7  How  much  software  is  needed  to  implement  the  encryption 
protocols.  Does  the  security  of  the  network  depend  on  this 

software? 


channel s 


One  wishes  a  protocol  which  permits  channels  to 
dynamically  opened  and  closed,  allows  the  traffic  flow  rate  to  e 
controlled  (by  the  receiver  presumably),  provides  reasonable 
error  handling,  and  all  with  a  minimum  of  mechanism  upon  which 
the  security  of  the  network  depends.  Clearly  the  more  software 
is  involved  the  more  one  must  be  concerned  about  the  safety  o 
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the  overall  network.  The  performance  resulting  from  use  of  the 
protocol  must  compare  favorably  V7ith  the  attainable  performance 
of  the  network  using  other  suitable  protocols  without  encryption. 
Lastly,  one  would  prefer  a  general  protocol  which  could  also  be 
added  to  existing  networks,  disturbing  the  transmission 
mechanisms  already  in  place  as  little  as  possible.  Each  of  these 
issues  must  be  settled  independent  of  the  level  of  integration  of 
encryption  which  is  selected,  the  method  of  key  distribution,  or 
the  nature  of  the  encryption  algorithm  employed. 

To  illustrate  the  ways  in  which  these  considerations 
interact,  in  the  next  section  we  outline  a  complete  protocol. 
The  case  considered  employs  an  end  to  end  architecture  in  a  way 
that  can  be  added  to  an  existing  network. 


it..  Network  Encrypt  ion  Protocol  Case  S.yjdLy.: 
Process-Process  Encryption 


We  outline  here  a  general  encryption  protocol  that  operates 
at  the  relatively  high  level  of  process  to  process  communication. 
A  major  goal  is  the  minimization  of  the  software  on  which  the 
security  of  the  system  depends.  Network  communication  protocols 
often  involve  fairly  large  and  complex  parts  of  the  operating 
system,  sometimes  the  primary  source  of  complexity  and  amount  of 
code.  This  fact  results  from  the  variety  of  tasks  which  the 
network  protocol  must  perform,  such  as  connection  establishment, 
flov7  control,  error  detection  and  correction.  Thus,  this  design 
attempts  to  eliminate  as  much  as  possible  the  necessity  of 
trusting  that  software  for  secure  operation. 


The  design  presented  here  utilizes  process-process 
encryption.  In  process- process  encryption,  encoding  is  performed 
as  data  moves  from  the  source  process  to  the  system's  network 
software.  This  approach  minimizes  the  points  where  data  exists 
in  cleartext  form,  and  thus  the  mechanism  which  needs  to  be 
trusted.  While  a  higher  level  choice  could  be  made,  for  example 
a  1  lev;  In  g  the  pt'oneoses  to  'r'^rfonr'  their  ovrn  on  c  r  yp lo  n  v..  i..  iin 
r.  ii T.  o  e  i  V  e  3  ,  s  vi  h  a  c  h  o  i  c  c!  j  ?  s  not  a  3  s  u  r  e  t,  n  i  o  a  11  d  a  a  ^  e  n  ^  o  .■  t- 1 
the  network  is  encrypted.  Thus,  process-process  encryption  seems 
to  be  the  highest  safe  choice.  The  details  of  the  protocol  are 
applicable  either  to  public  key  based  or  conventional  algorithms. 


0  2  3  u  p  p  o  i"  c  e  d 


i  r.  t  r  i  h t  i. 


'!  f  t  hods 


;  i  3  1 1  3 


[Ponak  Ynhl 


It  is  assumed  that  the  reader  is  familiar  with  the  ideas  of 
operating  system  security  kernels . [Popek  78c]  Briefly,  security 
kernel  based  systems  attempt  to  isolate  the  security  relevant 
parts  of  the  system  and  place  them  in  a  nucleus,  running  on  the 
bare  hardware.  In  that  way,  the  secure  operation  of  the  system 
depends  only  on  that  software.  By  careful  design  and 
implementation  of  a  security  kernel,  it  is  possible  to  formally 
verify  the  security  properties  of  the  system .[ Popek  78a] 


■  I  ‘ 


£ 


4  .  1  0  V  e  r  V  i  e  v; 


In  this  protocol,  when  a  user  attempts  to  send  data,  a 
system  encrypt  function  encrypts  that  data  and  passes  it  to  the 
network  management  software,  which  is  logically  part  of  the  local 
operating  system.  The  network  software  then  attaches  headers  or 
other  information  required  by  the  network  protocols  and  sends  the 
data  to  the  communications  facility.  Upon  reception  by  the 
remote  network  software,  the  headers  and  other  protocol 
information  are  removed  from  the  data  and  the  data  is  passed,  via 
a  system  decrypt  function,  to  the  appropriate  user  process. 

Initial  establishment  of  the  communication  channel  is  also 
provided  in  a  secure  way.  When  a  user  process  attempts  to 
establish  communication,  the  local  network  software  is  informed 
by  the  system.  The  network  software  then  communicates  with  the 
network  software  at  the  remote  site.  When  the  tv.'O  network 
software  packages  have  arranged  for  the  new  communication,  the 
system  at  each  site  is  informed.  At  this  point  in  time,  the 
system  software  attempts  to  obtain  encryption  keys  for  this 
communication.  This  key  distribution  is  accomplished  either  with 
local  key  management  software,  or  via  a  key  distribution  center. 
If  a  conventional  encryption  algorithm  was  employed,  then  new 
keys  would  be  chosen  and  distributed.  If  a  public  key  encryption 
algorithm  was  utilized,  then  the  public  key  of  the  recipient  and 
the  private  key  of  the  sender  v.'ould  be  retrieved. 

In  the  public  key  ease,  an  additional  authentication 
sequence  is  required,  since  the  public  keys  may  have  been  used 
before.  This  authentication  sequence  effectively  establishes  a 
sequence  number  to  be  included  in  each  message  to  guarantee  that 
previous  messages  ear.  not  be  recorded  by  an  imposter  and 
replayed.  The  authentication  sequence  is  not  required  in  the 
conventional  encryption  ease  since  the  new  keys  effectively  form 
an  authentication  and  prevent  any  prior  messages  from  being 
useful  . 


pi’ .  ' V  J.  0  (.13 1  y 
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user  processes  are  given  capabilities  to  send  and  receive  data. 
The  operating  system  calls  employed  should  automatically  encrypt 
and  decrypt  the  data  with  the  appropriate  keys.  Thus,  the 
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The  above  design  allows  existing  network  protocols  in  many 
eases  to  be  largely  left  undisturbed,  and  preserves  much  existing 
network  software.  If  desired,  user  processes  ean  be  blocked,  in 
a  reliable  way,  from  communicating  with  any  other  user  processes 
anywhere  in  the  network  unless  the  protection  policy  involved  in 
setting  up  the  keys  permits  it.  Each  user's  communication  is 
protected  from  every  other  user's  communication.  Perhaps  most 
important,  the  amount  of  trusted  mechanism  required  in  the  system 
nucleus,  as  we  shall  see,  is  quite  limited. 


1.2  Ihe  PnnrvDtion  (lgnnee_t_ign  Proto,G_ol 

The  details  of  secure  communication  establishment, 

described  above,  are  nc»  P’’*’ [Im  'the  vantaee 
this  procedure,  we  first  view  the  “Pf then  see 

podht  of  PP---?  Tf  -rro  e'r^ier  rahin.’  use  of  _the 


UlliO  - -  -  -  nr.  IfpmPl.  aOU  U  lit-'  11  SCG 

nn-int-  of  the  operating  system  nucleus,  or  Kernel,  du 
Sow  best  network  protoeol  PPfP--  “PPPP^is  drseission,  a  logical 
kernel  facilities.  For  y»  .-...v-iii  ho  known  as  a 

coo^uhleatlon  ehannel  between  P-  P-ppp^^,  ,,  ,,f3„3d  to  as  the 


For  brevity,  in  this  ciseussio,,  =  ■ 

1  ehannel  between  two  processes  will  be  known 

ionneetion.  Ihe  host  PP/“PP;:„PP^‘-P:„r“;  ^^rroSr ^0^ worh::  tSe 

ror^c/^hrrpi  "nrite^S^piistioa  e^and  re,uires  ec  idera^^^^^ 

code  to  implement  the  necessary  protocols,  an  importa 
not  to  have  security  depend  on  the  NPM. 

in  the  discussion  below,  it  will  be  pp^ppPppP  ‘1;^ ,^=3^3 
of  matching  encryption  keys,  one  hel  channel.  A 

involved,  “PPPPPP  botwUn  two  hosts  therefore 

bidirectional  (duplex)  kernel  of  each  host  in  normal 

employs  two  pairs  duplex  ehannel  established 

operational  mode  ha  network  How  these  channels  are 

with  each  other  kernel  in  ^osts  are  initialized, 

established  eoncerns  Phe^mathod^hy^^whieJ  hosts^are^^^ 

and  IS  diseusse  other  channels  between  the 

exchanging  keyo  that  will  nps'iares  [2]  The  need  for 

two  hosts  and  for  haP"Pl->'PP"P;  o^^lincd .  If  it  is 

those  will  become  apparent  as  the  PPPP°p°1  keep  the 

desired,  the  protocols  “  wjthli  the  cnoryptlcn  units  of  the 

cleartext  form  of  keys  only  within  P"®  p"°P'P  .  j  Is  not 

hosts.  For  slnplieity  of  explanation,  that  requirement 

11!=? fid  her e  . 


ised  here. 

1.  •  ocitablished  in  the  following  way. 

A  connection  wUl  get  establish  connections 

When  hosts  arc  ,3  to  the  one  we  outline  here,  and 

through  a  procedure  analogous  to  process  wishes 

described  in  more  detail  plater .  executes  an  "establish 

to  eonneet  to  a  u-’i-'ii  irf'^rns  the  il?M  of  r.'v.f  r  e  c! '  :.'t  . 

con;' -motion"  r  y,nt:  n  ^  'f-orGi=Tn  hPM  using  tnair 

me  M?M  exchanges  mes,sagc3  wien  -  include  any  host- 

already  existing  channel .  This  exchange  network, 

host  protocol  for  establi.shing  ‘  that  a  connection  has  been 

Presum.nbly  the  MPMs  ^  v  en  Uia  1 1  y  ^ag  r  ec  ^t  na  t  _  c- onn  e 

tr'lcilunicat;: 

^.rrnet!"  ^Ert^err^r^hrs  p^int,  the  NPMs  must  -k  the  kernel  to 
Establish  the  Chan„el_  fcr__the^  prceesses.  ^J^his  = 
e  rformed 


Ml"The'sLe  key  could  be  used  for  both  directions  in  convention- 
il  encryption,  but  =  theL'  kernel- 
llUll  %nu^rcirn'nels'iould%e  replaced  by  kerncl-kcy  distribu¬ 
tion  center  secure  channels. 
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capabilities  to  the  user  process  so  that  subsequent  requests  can 
be  made  directly  by  the  process. 

In  order  to  explain  in  more  detail,  the  following  four 

prototype  kernel  calls  arc  described.  The  first  two  are  involved 

in  setting  up  the  encryption  channel,  and  presumably  would  be 
issued  only  by  the  NPMs.  The  second  two  arc  the  means  by  which 
user  processes  send  and  receive  data  over  the  connection. 

GI D ( fo reign- ho s t ,  con ne e t i on - id  ,  proeess-id,  state)  Give-id. 
This  call  supplies  to  the  kernel  an  id  which  the  caller 

would  like  to  be  used  as  the  name  of  a  channel  to  be 

established.  The  kernel  checks  it  for  uniqueness  before 
accepting  it,  and  also  makes  relevant  protection  checks.  If 
state  =  "init",  the  kernel  chooses  the  encryption  key  to  be 
associated  with  the  id  (or  queries  key  controller  for  key). 
The  entry  <connection-id ,  key,  process-id,  state>  is  made  in 
the  kernel  Key  Table.  Using  its  secure  channel,  the  kernel 
sends  < connection- id ,  key,  policy-info>  to  the  foreign  host. 
The  policy-info  can  be  anything,  but  in  the  military  case, 
it  should  be  the  security  level  of  the  local  process 
identified  by  proeess-id.  In  a  commercial  case  it  might  be 
the  organization  by  which  the  user  was  employed.  It 
might  also  be  a  network-v;idc  global  name  of  the  user 
associated  with  the  process.  If  state  =  "complete",  then 
there  should  already  be  an  entry  in  the  Key  Table  (caused  by 
the  other  host  having  executed  a  GID)  so  a  check  for  match 
is  made  before  sending  out  the  kernel-kernel  message  and  a 
key  is  not  included.  The  NPM  process  is  notified  when  an  id 
is  received  from  a  foreign  kernel. 


CID ( connection- id )  Close  id.  The  NPM 

process  at  the  local  site  are  both 


and  the  appropriate 
notified  that  the  call 
Key  Table 


has  been  issued.  The  corresponding  entry  in  the 
is  deleted.  Over  the  secure  kernel-kernel  channel 
message  is  sent  telling  the  other  kernel  to 
corresponding  Key  Table  entry.  This  call 

X  ;  u  t.  a  h  1  a  only  ay  "''PM  a  or  by  th ';>  Drone  33  whDse 
entry  indicates  that  it  Ls  the  process  associat 


delete  its 
should  be 

K  e  y  i  3  u  1  e 
e  n  v;  1 1  h  t  i  s 


id',  to  block  potential  denial  of  service  problems. 


En cry pt ( con  nee t ion-id ,  data)  Encrypt  data  and  buffer  for  NPM. 
Ur.!."  D  .1  1 1  n'ii.3  i  (1  b  e  g  r  i  r.  y  i  n  i'-d  r  ■'!!  t  i  o  n  ,  r.uc'i  ■>  '  c  ■  V'rnce 
nurriDers,  to  the  data,  encrypts  the  data  using  the  key 
corresponding  to  the  supplied  id  (fails  unless  the  process- 
id  associated  with  the  conncction-id  matches  that  of  the 
caller)  and  places  the  data  in  an  internal  buffer.  The  NPM 
is  informed  of  the  awaiting  data. 


Decrypt(connection-id,  user-buffer)  Decrypt  data.  This  call 
decrypts  the  data  from  the  system  buffer  belonging  to  the 
con n ec t io n- id  supplied  using  the  appropriate  key.  The  data 
is  moved  into  the  user's  buffer.  The  call  fails  unless  the 
process-id  stored  in  the  Key  Table  matehes  the  caller  and 
any  data  integrity  ehccks  succeed  (such  as  sequence 
numbers ) . 
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An  important  new  kernel  table  is  the  Key  Table. [1]  It  contains 
some  number  of  entries,  each  of  which  have  the  following 
information: 

< foreign-host,  connection-id,  key,  sequence-no,  local-process-id> 

There  is  one  additional  kernel  entry  point  besides  the  calls 
listed  above,  namely  the  one  caused  by  control  messages  from  the 
foreign  kernel.  There  arc  two  types  of  such  messages:  one 
corresponding  to  the  foreign  GID  call  and  the  other  corresponding 
to  a  foreign  CID.  The  first  makes  an  incomplete  entry  in  the 
receiving  kernel's  Key  Table,  and  the  second  deletes  the 
appropriate  entry. 

The  following  sequence  of  steps  illustrates  how  a  connection 
would  be  established  using  the  encryption  connection  protocol. 
The  host  processors  involved  are  numbercdl  and  2.  Process  A  at 
host  1  wishes  to  connect  to  process  B  at  host  2. 

1.  Process  A  executes  an  establish  connection  call  which  informs 
MPM§1,  saying  "conn  from  A  to  B62".  This  message  can  be  sent 
locally  in  the  clear.  If  confinement  is  important,  other 
methods  can  be  employed  to  limit  the  bandwidth  between  A  and 
the  NPM. 

2.  NPM@1  sends  control  messages  to  NPM §2  including  whatever 
Host- Ho St  protocol  required. [2] 

3.  NPM@2  receives  an  indication  of  message  arrival,  does  an  I/O 
call  to  retrieve  it,  examines  header,  determines  that  it  is 
recipient  and  processes  the  message. 

4.  NPM@2  initiates  step  2  at  site  2,  leading  to  step  3  being 
executed  at  site  1  in  response.  This  exchange  continues 
until  NPMS1  and  NPM02  open  the  connection,  having  established 
whatever  internal  local  name  mappings  are  required. 

5.  NPMtM  executes  GID  ( connec  tion- id  ,  process-id  ,"  init"  )  ,  v-'here 
connec tion- id  is  an  agreed  upon  connection  id  between  the  two 
NPMs,  and  process-id  is  the  local  name  of  the  process  that 
requested  the  connection. 

0.  Tn  0  X  ec' u  t  i  n  GID,  th l:erv'ielDi  o 'i  t  ■!  or  obtai?’:'!  a 

r.-'-Y,  makes  0;atry  in  ibr  Kay  Tabic;,  ant  senJo  a  mrsragc' 

over  its  secure  channel  to  Kernel@2,  v7ho  makes  corresponding 
entry  in  its  tabic  and  interrupts  NPH§2,  giving  it 
connection-id  . 

7  .  [’  M  2  Ms a  r  n  0  r  r  e  a  p  a n  i  n  ;c  (i  ’’  0  (  a  o  n  n  ■;  c  b  i  r  -  i  d  ,  p  c  o  n  a  a  a  -  1  :  '  , 

"  co::ipl2  to”  )  v/here  o  an  n  oc  P  i  o  a- ■>  d  ir.  tnu  no  mo  and  p  roc  a  s  -  xd  ' 

is  the  one  local  to  host  2.  This  call  interrupts  process- 
id’,  and  eventually  causes  the  appropriate  entry  to  be  made 
in  the  kernel  table  at  host  1.  The  making  of  that  entry 
interrupts  NPM§1  and  process-id61. 


[1]  In  some  hardware  enci’yption  implementations,  the  keys  are 
kept  internal  to  the  hardware  unit.  In  that  case,  the  key  entry 
in  the  Key  Table  can  merely  be  an  index  into  the  encryption 
unit's  key  table . 

[2]  The  host-host  protocol  messages  would  normally  be  sent  en¬ 
crypted  using  the  NPM-NPM  key  in  most  implementations. 
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8.  Process“id  and  process-id’  can  now  use  the  channel  by  issuing 
succeeding  Encrypt  and  Decrypt  calls. 

There  are  a  number  of  places  in  the  mechanisms  just 
described  where  failure  can  occur.  If  the  network  softv/are  in 
either  of  the  hosts  fails  or  decides  not  to  open  the  connection, 
no  kernel  calls  are  involved,  and  standard  protocols  operate.  A 
GID  may  fail  because  the  id  supplied  was  already  in  use,  a 
protection  policy  cheek  was  not  successful  or  because  the  kernel 
table  was  full.  The  caller  is  notified.  He  may  try  again.  In 
the  ease  of  failure  of  a  GID,  it  may  be  necessary  for  the  kernel 
to  execute  most  of  the  actions  of  CID  to  avoid  race  conditions 
that  can  result  from  other  methods  of  indicating  failure  to  the 
foreign  site. 


ii-3.  Discussion 


The  encryption  mechanism  just  outlined  contains  no  error 
correction  facilities.  If  messages  are  lost,  or  sequence  numbers 
are  out  of  order  or  duplicated,  the  kernel  merely  notifies  the 
user  and  network  software  of  the  error  and  renders  the  channel 
unusable.  This  action  is  taken  on  all  channels,  including  the 
kernel-kernel  channels.  For  every  ease  but  the  last,  CIDs  must 
be  issued  and  a  new  channel  created  via  GIDs.  In  the  last  case, 
the  procedures  for  bringing  up  the  network  must  be  used. 

This  simple  minded  viev/  is  acceptable  in  part  because  the 
expected  error  rate  on  most  networks  is  quite  low.  Otherwise,  it 
would  be  too  expensive  to  reestablish  the  channel  for  each  error. 
However,  it  should  be  noted  that  any  higher  level  protocol  errors 
are  still  handled  by  that  protocol  software,  so  that  most 
failures  can  be  managed  by  the  NPM  without  affecting  the 
encryption  channel.  On  highly  error  prone  channels,  additional 
protocol  at  the  encryption  level  may  still  be  necessary.  See 
Kent  [Kent  76]  for  a  discussion  of  resynehronization  of  the 
sequencing  supported  by  the  encryption  channel. 

L'rom  tile  pr' o  lie c  1 1  or;  viewpoint,  one  can  consider  the 

collection  of  NPMs  across  the  network  as  forming  a  single 
(distributed)  domain.  They  may  exchange  information  freely  among 
them..  No  user  process  can  send  or  receive  data  directly  to  or 

un  except  via  narwow  b  a  n  :i  w  .1  <1 1  h  ca  Mine  la  on 
control  iniormation  is  sent  to  the  NPii  and  status  and  error 
information  is  returned.  These  channels  can  be  limited  by  adding 
parameterized  calls  to  the  kernel  to  pass  the  minimum  amount  of 
data  to  the  NPMs,  and  having  the  kernel  post,  as  much  as 
possible,  status  reports  directly  to  the  proeesses  involved.  The 
ehannel  bandwidth  eannot  be  zero,  ho v; ever. 


l.ii  System  Initialization  Proeedures 

The  task  of  bringing  up  the  network  software  is  composed  of 
two  important  parts.  First,  it  is  neeessary  to  establish  keys 
for  the  secure  kernel-kernel  channels  and  the  NPM-NPM  ehannels. 


page  1  1 


Next,  the  NPM  can  initialize  itself  and  its  communications  with 
other  NPMs.  Finally,  the  kernel  can  initialize  its 
communications  with  other  kernels.  This  latter  problem  is 
essentially  one  of  mutual  authentication,  of  each  kernel  with  the 
other  member  of  the  pair,  and  appropriate  solutions  depend  upon 
the  expected  threats  against  which  protection  is  desired. 

The  initialization  of  the  kernel-kernel  channel  and  NPM-MPM 
channel  key  table  entries  will  require  that  the  kernel  maintain 
initial  keys  for  this  purpose.  The  kernel  can  not  obtain  these 
keys  using  the  above  mechanisms  at  initialization  because  they 
require  the  prior  existence  of  the  NPM-NPM  and  kern  el- kernel 
channels.  Thus,  this  circularity  requires  the  kernel  to  maintain 
at  least  two  key  pairs. [1]  However,  such  keys  could  be  kept  in 
read  only  memory  of  the  encryption  unit  if  desired. 

The  initialization  of  the  NPM-NPM  communications  then 
proceeds  as  it  would  if  encryption  were  not  present.  In  most 
netvjorks,  some  form  of  host-host  reset  command  would  be  sent 
(encrypted  with  the  proper  MPM-NPM  key).  Once  this  NPM-NPM 
initialization  is  complete,  the  kernel-kernel  connections  could 
be  established  by  the  NPM.  At  this  point,  the  system  would  be 
ready  for  new  connection  establishment.  It  should  be  noted  that, 
if  desired,  the  kernels  could  then  set  up  new  keys  for  the 
kernel-kernel  and  MPM-NPM  channels,  thus  only  using  the 
initialization  keys  for  a  short  time.  To  avoid  overhead  at 
initialization  time,  and  to  limit  the  sizes  of  kernel  Key  Tables, 
NPMs  probably  should  only  establish  channels  with  other  ^Ms  when 
a  user  wants  to  connect  to  that  particular  foreign  site,  and 
perhaps  close  the  NPM-NPM  channel  after  all  user  channels  are 
closed  . 


J4.5,  Symmetry 

The  case  study  just  presented  portrayed  a  basically 
symmetric  protocol  suitable  for  use  by  Intel  1  iaient  nodes,  a 

fairly  general  .1  i  n  r  t  i'r'' a s  ,  oav  of  T'  •  i 

L  a  c;  k  a  a  J.  g  o  r  i  r.  h  m  i.  o.  capacity,  a  s  i  1 1  u  s  t  r  -a  t  a  d  d  y  s  i . ;  p  L  r'  !'.  i  r  u  Vi  a  ;  o 

terminals  or  simple  microprocessors.  Then  a  strongly  asymmetric 
protocol  is  required,  v/here  the  burden  falls  on  the  more  pov/erful 
of  the  pair  . 


;\  form  o  this  problem  uiigh'c  also  occur  if  encrypt  J,  or.  ir.  not, 

handled  by  the  system,  but  rather  by  the  user  processes 
themselves.  Then  for  certain  operations,  such  as  sending  mail, 
the  receiving  user  process  might  not  even  be  pi’esent.  (Note  that 
such  an  approach  may  not  guarantee  the  encryption  of  all  network 


[13  In  a  centralized  key  distribution  version,  the  only  keys 
which  would  be  needed  would  be  those  for  the  key  distributor 
NPM-host  NPM  channel  and  for  the  key  distributor  kernel-host  ker¬ 
nel  channel.  In  a  distributed  key  management  system,  keys  would 
be  needed  for  each  key  manager. 
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traffic.)  Schroeder  and  Needham  have  sketched  protocols  that  are 
similar  in  spirit  to  those  presented  here  to  deal  with  such 
cases. 

3.  •  Datagrams 

The  case  of  electronic  mail  illustrates  an  important 
variation  to  the  protocols  presented  earlier.  Assume  that  a  user 
at  one  site  v/ishes  to  send  mail  to  a  user  at  another  site. 

Using  conventional  encryption  algorithms,  the  first  user 
would  request  a  connection  to  the  second  user,  and  a  new  key 
would  be  chosen  and  distributed  by  the  key  controller  for  use  in 
the  communication.  That  key  is  sent  using  the  secret  keys  of  the 
two  users. 


However,  since  the  second  user  may  not  be  signed  on  at  the 
time,  a  daemon  process  is  used  to  receive  the  mail  and  deliver  it 
to  the  user's  "mailbox"  file  for  his  later  inspection.  It  is 
desirable  that  the  daemon  process  not  need  to  access  the 
cleartext  form  of  the  mail,  for  that  would  require  the  mail 
receiver  mechanism  to  be  trusted.  This  feat  can  be  accomplished 
by  sending  the  mail  to  the  daemon  process  in  encrypted  form  and 
having  the  daemon  put  that  encrypted  data  directly  into  the 
mailbox  file.  The  user  can  decrypt  it  when  he  signs  on  to ^  read 
his  mail.  In  that  way,  the  daemon  only  needs  the  ability  to 
append  to  a  user's  mailbox  file. 


In  order  for  the  user  to  know  the  new  key  used  for  this 
mail,  however,  the  key  distribution  algorithm  used  earlier  must 
be  modified.  Rather  than  sending  the  key  for  this  connection  to 
both  the  sender  and  the  receiver,  the  key  controller  sends  the 
key  twice  to  the  sender,  one  copy  encrypted  with  the  sender's 
secret  key  and  one  copy  encrypted  with  the  receiver's.  The 
sender  can  prepend  the  copy  of  the  key  encrypted  in  the 
receiver's  secret  key  to  the  mail  before  transmission.  hhen  the 
c "  i  ‘j  1  c  n  t  s  I  '■  ns  on,  ii  i  s  o n  .n a  ,i  1  '■  o  7 r  m  w  1 1 1  / a iit  1  n e  t.  n  s  1 1 1  o  j  c 


i  i  e 


f  I  !l  d  t  h . 


key 


;  s  a  r-'. 


d  n  c  I'  y  p  t  i  t  u  3  1/1  g  h  i  j 


key,  and 


then  use  the  new  key  to  decrypt  the  remaining  text. 


algorithms,  the  mail 
knows  nhdt  'i-y 
authentication 


In  the  case  of  public  key  encryption 

r  1  b  1  m  i  n  "■  o  n  r-  ,■(  h  1 1  ■  i  m  o  ].  i  ^  n  d  o  1  n  c;  ?  t  a  e  ; '  a  n'  •  i> n 

GO  use  in  decryption  (ins  secret  key).  i.ov.ever, 

is  not  possible  since  the  recipient  is  not  present  when  the 
message  is  received.  Thus,  it  may  be  a  replay 
sent  message.  This  problem  can  be  prevented 
encryption  algorithm  case  via  various  protocols  with  the  key 
managers,  for  example,  by  timestamping  the  mail  and  having  the 
recipient  keep  track  of  recently  used  mail  keys. 


of  a  previously 
in  the  conventional 


Both  mechanisms  outlined  above  do  guarantee  that  only  the 
desired  recipient  of  a  message  will  be  able  to  read  it.  However, 
as  pointed  out,  they  don't  guarantee  to  the  recipient  the 
identity  of  the  sender.  This  problem  is  essentially  that  of 
digital  signatures,  and  is  discussed  in  the  next  section. 


r*li»*t**'**^' 
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Public  Key  A.  1  p  o  r  i.  t  h  m  s  and  Digital  Signatures 

The  development  of  public  key  based  encryption  vas  greeted 
by  a  great  deal  of  interest,  since  the  method  appears  to  present 
considerable  advantages  over  conventional  encryption  methods, 
especially  with  respect  to  key  distribution  and  digital  mail 
signatures. 


However,  on  closer  examination,  it  seems  that  public  key 
algorithms  possess  no  particular  advantages  over  conventional 
algorithms.  The  reasons  for  this  conclusion  arc  readily  seen  and 
are  outlined  below. 


6..1  Key  Distribution 

Let  us  examine  each  of  the  advantages  claimed  for  public  kej 
algorithms.  The  first  is  key  distribution.  Simply  put,  public 
key  advocates  argue  that  an  automated  "telephone  book"  of  public 
keys  can  generally  be  made  available,  and  therefore  whenever  user 
X  wishes  to  communicate  with  user  y,  x  merely  must  look  up  y’s 
public  key  in  the  book,  encrypt  the  message  with  that  key,  and 
send  it  to  y.[Diffic  76]  Therefore  there  is  no  key  distribution 
problem  at  all.  Further,  no  central  authority  is  required 
initially  to  set  up  the  channel  between  x  and  y. 


Needham  and  Schroeder  point  out  however  that  this  viewpoint 
is  incorrect:  some  form  of  a  central  authority  is  needed  and  the 
protocol  involved  is  no  simpler  nor  any  more  efficient  than  one 
based  on  conventional  algorithms .[ Needham  773  Their  argument  may 
be  summarized  as  follows.  First,  the  safety  of  the  public  key 
scheme  depends  critically  on  the  correct  public  key^  being 
selected  by  the  sender.  If  the  key  listed  with  a^  name  in  the 
"telephone  book"  is  the  wrong  one,  then  there  is  no  security. 
Furthermore,  maintenance  of  the  (by  necessity  machine  supported; 
book  is  non  trivial  because  keys  will  change;  cither  because  of 
the  natural  desire  to  replace  a  key  pair  which  has  used  for 
a  ’  '  1  ;[i  •_)  r.  ’•  o  r  Cl  a  a  t  r  a s  m  1  a  a  i  a  ,  0  r  0  '  a  1.1  o  e  ■-  i'  y  *  .,■  0  '  - . 


c  c  iPi  p  r  o  m  i  I 


t  h  0  u  g  h 


V  a  r  i  e  c  y  of  w  a  y  s 


T here  must  be  some  source 


of  carefully  maintained  "books"  with  the  responsibility  of 
carefully  authenticating  any  changes  and  correctly  sending  out 
public  keys  (or  entire  conies  of  the  book)  upon  request. 


Needham  and  Schr’oeder  also  ex  hi.  bit  protocols  to  provide  the 
desired  properties  for  public  key  systems,  and  show  that  there 
arc  equ  i.  valent  protocols  for  conventional  algorithms.  The 
protocols  arc  equivalent  both  in  terms  of  numbers  of  messages 
required  as  well  as  in  the  mechanisms  which  must  be  trusted.  The 
only  observable  difference  is  that  the  central  authority  in  the 
conventional  case,  in  addition  to  being  trusted,  must  also  keep 
its  collection  of  (conventional)  keys  secret.  Based  on  the  work 
at  UCLA  on  secure  operating  systems,  it  appears  that  the  task  of 
constructing  a  secure  central  authority  is  no  harder  than 
building  the  correct  one  needed  for  public  key  systems. 


page 


6 . 2 


Si gnatures 


The  seeond  area  in  which  public  key  methods  are  often 
thought  to  be  superior  to  conventional 

sipnatures.  The  method,  assuming  a  suitable  ^ 

algorithm,  is  for  the  sender  to  encode  the  mail  by ^  decryptin^^ 
it  with  his  private  key  and  then  send  it.  The  receiver  eco  e^ 
the  message  by  "encrypting"  with  the  sender's  public  key.  The 
uSual  View  is  tLt  this  procedure  does  pot  recuire  = 
nni-horitv  exeept  to  adjudieate  an  authorship  ehallenge. 

However, ^two  points  should  be  noted.  First,  a  central  authority 
is  needed  by  the  recipient  for  aid  in  deciphering  c 
ilssagc  received  from  any  given  author  (to  get  the  corresponding 
publtr  key!  as  above).  Lcond ,  the  central  authority  must  keep 
all  old  values  of  public  keys  in  a  reliable  way  to  properly 
adjudicate  conflicts  over  old  signatures  (consider  the  relevant 
lifetime  of  a  signature  on  a  real  estate  deed  for 

example). [Needham  77] 

Further,  and  more  serious,  the  unadorned  public  key 
signature  protocol  just  described  has  an  important  flaw.  The 
author  of  signed  messages  ean  effectively  disavow  and  repudiate 
hfs  sienati^es  at  any  ti™=,  merely  by  causing  his  secret  key  to 
be  made  public,  or  "compromised".  When  such  an  event 
either  by  accident  or  intention,  all  messages  previously  signed 
[si!;rtSe  ,lven  private  key  are  Inval Iba ted .  since  the 
Of  validity  has  been  destroyed.  Because  the  priva  e  ^ 
known  anyone  could  have  created  any  of  the  messages  sent  carl 
by  tile  gTcrauthor.  None  of  the  signatures  can  be  relied  upon. 

Hence  the  validity  of  a  signature  on  J  tc 

ao  Hip  entire  future  history  of  protection  of  the  private 

key!  Further,  r^TT^ility  to  remove  the  proteetion  ;;°;^^des  in 
prLisely  the  individual  (the  author)  who  should  not  hold  that 
right  That  is,  one  important  purpose  of  a  signature  is  _t 
[;dSc;te  rLpc^sibillty  for  the  eontent  of  the  eeeo„panying 
n,essnse  In  a  »ay  that  cannot  be  later  dlsavousd. 

bora-  people  r,iay  ai'sue  thut  this  eo.iaern  is  oveny 
eonservativ;;  that  existing  signature  methods  are  "ot  ye,-y 
reuawe,  thk  individuals  have  considerable  incentive  net  to 


repudiate  their  signature 


and 


so 


on-? 


j  u s  t i f i e d  in 


■;  .1 1  r"i In 


' .)  .-/ad 


:  a  I  ■  1 1  i.  0  a  . 


)■!  o  r 


•L 


1  .1  o  is  cLcariy  un  ua  t  isfac  to  ry  ,  especially  ii  it  is 
possible  trdevise  suitable  digital  signature  methods  which  do 
not  suffer  from  this  problem. 

The  situation  with  respect  to  signatures  ^ 
aleorithras  initially  appears  slightly  better.  Rabin  [Rabin  78] 
pr^pos^s  Elsewhere  in  this  volume  a  -^hod  _  of  d  igital  -gna  ures 
based  on  any  strong  conventional  algorithm.  Like  public  key 
methods  it  too  requires  either  a  central  authority  or  an  expli 
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agreement  between  the  two  parties  involved  to  get  matters 

coing  [1]  Similarly,  an  adjudieator  is  required  for  ehallenges. 

Lbi^'s  method  however  uses  a  large  number  of  keys,  with 

being  reused  from  message  to  message.  As  a  result,  if  a  few  k  yo 

arc  Lmpromised,  other  signatures  based  on  other  keys 

safe!  However!  that  is  not  a  real  advantage  over  public  key 

methods,  sinee  one  could  readily  add  a  layer  g3 

public  key  method  to  change  keys  for  each  message  Rabin 
for  conventional  methods.  One  could  even  use  / 

Rabin's  scheme  itself  with  public  keys,  although  it  is  ca^y  t 

develop  a  simpler  one. 

However,  ilII  of  the  digital  signature  methods  described  or 
suggested  above  suffer  from  the  problem  of  repudiation  o 
signature  via  key  compromise.  Rabin's  protocol  or  ana  , 

it  merely  limit  the  damage  (or,  equivalently,  Pf'ovidc 
selectivity!).  It  appears  that  the  problem  is  intrinsic  to  ay 
approach  in  which  the  validity  of  an  author's  signature  depend^ 
on^secret  information,  which  can  potentially  be  revealed,  eithc 
by  the  author  or  other  interested  parties.  Surely  improvement 

would  be  desirable. 


6.3.  K  Reliable  Digital  Signature  Me.tho.d 

A  simple,  obvious  solution  is  to  Interpose  some 
Interpretive  layer  between  the  author  and  f 

Whatever  their  form.  For  example,  suppose  the  list  of  keys  in 
Rabin's  algorithm  were  not  known  to  the  author  ,  but  instead  wer 
contained  in  a  secure  Unit  (hardware  or  software). 
author  wished  to  send  a  signed  message,  he  merely 
message  to  the  Unit,  which  selected  the  appropriate  keys  and  then 
used  the  standard  algorithm.  Each  author  has  access  to  such 

Unit . 


The  loading  of 
particular,  t  h.  o  me 
.hem  into  ''mmh 
h  Ci  u  d  1  e  d  s  .1 1 1 t.  u  c  t  o  r 
Source  of  keys  (and 
protocol),  and  the 
be  deliverable  in  a 

with  their  internal 
( NR )  .  Such  an  MR 
earlier.  Note  that 


each  Unit  requires  some  examination.  In 
,ans  which  are  used  to  select  keys  and  insert 


h  .  1  f  m -ii  1  ohi'-l  'n.-  'n  a  r 

,ily.  That  in,  r.herc  mujt  bo  some  trusted 
matching  "standard  message"  in  the^  Rabin 
key  list  for  each  author/recipient  pair  must 
correct,  secret  v;.ay  to  the  noDroprinte  Units. 

'  of'  'n  f ;  .int'  tr..:  -  oi.r  ’e'l,,.)  ,  l.oi  'r 

communication  protocols,  a  Network  Registry 
appears  required  to  solve  the  problems  raised 
some  secure  communication  protocol  among  the 


[1]  In  his  paper,  Rabin  describes  an  in i t ia 1 i za t ion  me thod  ^whieh 
involves  an  explicit  contract  between  each  pair  of  th^ 

wish  to  communicate  with  digitally  signed  messages. 
easily  instead  add  a  central  authority  to  play  this  role,  using 
suitable  authentication  protocols,  thus  obviating  any  need  for 
two  parties  to  make  specific  arrangements  prior  to  exchanging 

signed  correspondence. 
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components  of  the  Network  Registry  is  required.  However,  it  can 
be  very  simple;  low  level  link  encryption  would  suffice. 

For  safety  and  efficiency,  the  NR  functions  presumably 
should  be  decomposed  and  distributed  throughout  the  network.  In 
particular,  the  failure  or  compromise  of  a  local  NR  would  then 
only  have  local  consequences.  One  can  even  construct  local  NR 
components  of  the  Network  Registry  in  a  decentralized  way  so  that 
compromise  of  more  than  one  component  would  be  required  before  a 
message  signature  was  affected. [1]  The  NR  architecture  issue, 
while  important,  is  to  some  degree  a  digression  here  and  so  we 
put  it  aside . 

The  Registry  concept  is  quite  common  in  the  paper  world.  A 
local  government's  real  estate  recorder's  office  is  probably  the 
most  commonly  known  example. 


6..  A  Au  thent  icat  ion 

We  now  make  an  important  observation.  It  is  still  necessary 
that  there  exist  a  guaranteed  authentication  mechanism  by  which 
an  individual  is  authenticated  to  the  MR  (presumably  directly  to 
the  local  Unit).  Any  reasonable  comunication  system  of  course 
ultimately  requires  such  a  facility,  for  if  one  user  can 
masquerade  as  another,  all  signature  systems  will  fail;  is 
required  is  some  reliable  way  to  identify  a  user  sitting  at  a 
terminal  --  some  method  stronger  than  the  password  schemes  used 
today.  Perhaps  an  unforgeable  mechanism  based  on  fingerprints  or 
other  personal  characteristics  will  emerge. 


L-5.  Simnltfication  si.  the  Pronosed  Sininature  Archi  teqt-.(iri£’ 
Specialized  Digital  Si^^. nature,  protocols  jjnngce s_sjiiiX 


Once  the  necessity  of  a  Network 
including  a  guaranteed  authentication 
;  '  j '  1  f  r  it  IvU'i  ir  >■''.  *  n.'ch?  r  j 

i*!  h  i  r  -iit,  to  ro.novc  fr.  •  t\ 

signature  protocols.  Instead,  any  of 
methods  will  suffice. 


Registry  is  recognized, 
mechanism,  it  appears  that 
L  1  ,t  I  I. i.-  ■  <  •' 

for  spec  i  ••  L  iz'.*'*  dl.;  ■  '•  1 
a  collection  of  simple 


It!  oirt  '! 
s  a  t  i  a  f  a  (j  b  0  r  i  i  y 

clearly  must 
communcicate 


^  Tl  **  ^  *  '  '  t'f  •**  "  * 

(including  pcrformir.g  user 
be  distributed,  and  clearly 

among 


securely  internally 


a !.!  0  h  e  11 1  i  c  a  1 10  n  ;  ,  it 
must  be  able  to 
the  distributed 

components.  Given  that  such  facilities  exist,  then  the  following 
is  an  example  of  a  simple  implementation  of  digital  signatures 
which  does  not  require  a  sp-^cializcd  protocol  or  encryption 
algorithm : 


1.  The  author  authenticates  with  a  local  Network  Registry 


[1 3  See  section  6.6. 
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component,  creates  a  message,  and  hands  the  message  to  the  NR 
together  with  the  recipient  identifier  and  an  indication  that  a 
registered  signature  is  desired. 

2.  A  Network  Registry  (not  necessarily  the  local  component) 
computes  a  simple  characteristic  function  of  the  message,  author, 
recipient,  and  current  time,  encrypts  the  result  with  a  key  known 
only  to  the  Network  Registry,  and  forv.-ards  the  resulting 
"signature  block"  to  the  recipient.  The  NR  only  retains  the 
encryption  key  employed. 

3.  The  recipient,  when  the  message  is  received,  can  ask  the  NR 

if  the  message  was  indeed  signed  by  the  claimed  author  by 
presenting  the  signature  block  and  message.  Subsequent 

challenges  are  handled  in  the  same  way. 

This  simple  protocol  involves  little  additional  mechanism 
beyond  that  which  was  needed  by  the  Network  Registry  any\;ay.  It 
does  require  that  the  Network  Registry  be  involved  in  every 
message  signature  and  validation.  However,  recall  that  all  of 
the  unadorned  signature  methods  reviewed  earlier  require 

involvement  of  some  form  of  a  Network  Registry  for  at  least  the 
first  message  between  any  two  parties.  Public  key  protocols  must 
check  the  "telephone  book",  and  Rabin's  method  requires  either  a 
contract  or  a  Network  Registry.  Furthermore,  when  one  adds  a 
more  complete  Network  Registry  on  top  of  those  other  signature 
methods  to  correct  their  repudiation  problem,  all  methods  involve 
the  NR  for  each  message.  Note  that  this  protoeol  also  does  no 
require  the  NP  to  maintain  any  significant  storage  for  signature 
blocks  . 


6 . 6  Performance  and  Safe t y 


Certain  elementary  precautions  should  be  taken  in  the  design 
of  the  Network  Registry  to  avoid  unnecessary  internal  message 
exchanges  and  to  assure  safety  of  the  keys  used  to  encrypt  the 

:  r  r'o 1  n  c  '  onh  ant  h  presum -i  n  Ly 

i  0  V  o  ].  V  e  '  '  -  r.  1  r'  -/  r. ; ;  s  !.  :  M  -j  t  LI !'  c  >.)  J.  c  c  0  i  c.  u  L  a  t  ion.  a  l  e  c  y 
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n  !.  '  n  a  t  u  r 


enhancements  could  include  the  use  of  different  keys  at  each 
distributed  site,  replicating  sites,  and  employing  a  signature 
block  computation  which  requires  the  cooperation  of  multiple 


3  i  l  ■'  n  . 


;  [  r  i  y  h  ^  To  0  w  ’  nd 


Lo  1  n.  ..1 


t  h.  e  y  a  r  L'  n  o  t  din.:  u  n  s  c;  d 


■unenen  rare 


From  the  preceding  discussion,  we  conclude  that  the  digital 
signature  algorithms  proposed  heretofore  arc  unsatisfactory,  and 
the  iraprovGcionts  required  to  correct  their  inadequacies  make  e 
use  of  a  specialized  digital  signature  algorithm  unnecessary. 


We  note  here  that  the  safety  of  signatures  in  this  proposal 
also  depends  on  the  future  history  of  protection  of  keys  as 
before,  in  this  case  those  held  by  the  Network  Registry. 
However,  there  are  several  crucial  differences  between  this  case 
and  previous  proposals.  First,  the  authors  of  messages  do  not 
retain  the  ability  to  repudiate  signatures  at  will.  Second,  the 
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Network  Registry  can  be  structured  so  that  failure  or  compromise 
of  several  of  the  components  is  necessary  before  signature 
validity  is  lost.  In  the  previous  methods,  a  single  failure 
could  lead  to  compromise. 


1-  Conclusions 

We  draw  a  number  of  specific  conclusions,  as  well  as  more 
general  perspectives  from  the  preceding  discussions.  The 
specifics  are  as  follows.  First,  public  key  encryption  systems, 
viewed  in  the  context  of  the  network  protocols  by  which  they  must 
be  used,  do  not  seem  to  provide  any  significant  advantages  over 
conventional  encryption  algorithms.  Each  important  function  that 
has  been  recognized  can  be  performed  at  least  as  easily  by 
conventional  methods  with,  it  appears,  no  more  supporting 
mechanism.  Therefore,  if  strong  conventional  algorithms  are 
easier  to  develop,  as  has  been  speculated  [Rivest  77b],  research 
would  be  better  devoted  to  that  area  rather  than  public  key 
systems  . 

Second,  it  seems  that  the  digital  signature  methods  which 
have  been  proposed,  both  public  key  and  conventional  algorithm 
based,  do  not  adequately  protect  recipients  of  signed  documents 
from  repudiation  of  signatures  by  the  author  revealing  the  secret 
kcy(s)  employed.  The  difficulty  appears  intrinsic  to  the 
approaches  being  taken.  An  alternative  is  available  which 
overcomes  this  problem  however,  that  involves  a  small  amount  of 
trusted  software. 

Third,  the  necessary  underlying  mechanism  required  to 
support  improved  digital  signature  methods,  as  well  as  other  user 
visible  secure  network  communcation  protocols,  is  relatively  well 
understood,  and  an  example  is  presented  in  this  paper.  The 
example  takes  account  of  the  important^  requirement  that  the 
amount  of  trusted  mechanism  involved  be  minimized  for  the  sake  of 
sa f e ty  . 


in  mci’i  global  term:",  c:  isouosion  -j.  ;  i -j 

has  been  intended  to  illustrate  the  current  state  of  the  art 
suggests  the  following  general  perspectives. 


;■)  0 1  v.  o  r 


common 


c  a  r  r’  i  c : 


■•■i  of  security  o " 
philosophy,  tnen 


general  principle ; 
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secure,  common  carrier  based,  point  to  point  c ommun i c a t ion _ can  be 
provided  are  reasonably  well  in  hand.  Of  course,  in  any 
sophisticated  implementation,  there  will  surely  be  considerable 
careful  engineering  to  be  done. 


However,  this  conclusion  rests  on  one  important  assumption 
that  is  not  universally  valid.  Either  there  exist  secure 
operating  systems  to  support  the  inoividual  processes  and  the 
required  eneryption  proLoeol  facilities,  or  each  niachine  opera 
as  a  single  protection  domain.  A  secure  implementation  of  a  Key 
Distribution  Center  or  Registry  is  necessary  in  any  case. 
Fortunately,  reasonably  secure  operating  systems  arc  well  on 
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their  way,  so  that  this  intrinsic  dependency  of  network  security 
on  an  appropriate  operating  system  base  should  not  seriously 
delay  common  carrier  security. 


rather  different  view  of  the 
the  goal  might  be  to 


One  could  however,  take  a 
nature  of  the  network  security  problem: 

extended  machine  for  the  user,  in  which  no 
the  network  is  required.  The  underlying 
!  data  from  site  to  site  as 
data  types  and  operations  that  are 
operates  securely  and  v;ith 
in  the  face  of  unplanned  crashes  of  any  nodes  in  the 
Synchronization  of  operations  on  user  meaning:,  ful 
j  .  - ^ '  iE  rel  lably 


provide  a  high  level 
explicit  awareness  of 
facility  is  trusted  to  securely 
necessary  to  support  whatever 
relevant  to  the  user.  The  facility 
integrity- 
network. 


on  Che  eking  Ac c o un t ) 

level  view  of  the  goal  of 
common  carrier  solutions 


objects  (such  as  Withdrawal 
maintained.  If  one  takes  such  a  high 
network  security,  then  the  simple 
respond  only  to  part  of  the  network  security  problem  and  more 

work  remains  . 
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ABSTRACT-  A  locking  protocol  to  coordinate  access  to  a  distributed  database  and  to  Mintain  system 


rewords  and  PHRASES:  concurrency,  crash  recovery,  distributed  databases,  locking  protocol 
consistency . 


This  paper  is  concerned  with  issues  of 
resource  coordination  in  distributed  systems,  and 
the  maintenance  of  system  consistency  throughout 
normal  and  abnormal  conditions.  A  database  is  said 
to  be  in  a  consistent  state  if  all  the  data  items 
satisfy  a  set  of  established  a.?..5e.r,t'.i.ons  of 
consistency  constraints.  A  database  subject  to 
multiple  access  requires  that  accesses  to  it  be 
properly  coordinated  in  order  to  preserve 
consistency.  Coordination  of  resources  in  a 
distributed  environment  exhibits  additional 
complexity  over  resource  coordination  in 
centralized  environments  due  to: 


1. 


2. 


possibility  of  crashes  of  participating 
sites  and  or  communication  links. 

-r  oU7h  rHilures  c-:  r-rd-n  th? 

databas-  inccr.siatsnt  if  not  appropr.iatcly 
handled  by  the  coordination  algorithm. 

network  partitioning:  in  general,  it  is 
not  possible  to  distinzuish  between 
ages  -..-hfc'-.  c-oeM  not  iv  d^iivei-.aJ  d'"- 
a  c.-asa  oi'  tr.e  rocipier-.t  site  an.1 
undelivered  messages  due  to  network 
partitioning.  Therefore,  network 

partitioning  in  the  more  general  sense 
considered  here  is  not  simply  a  matter  of 


•  This  research  was  supported  by  the  Advanced 
Research  Projects  Agency  of  the  Department  of 
Defense  under  Contract  MDA  903-77-C-021 1 • 

7  Partially  supported  by  the  Conselho  Nacional  de 
Desenvolvimento  Cientifico  e  Tecnologico,  CNPq, 
Brazil . 


proper  network  topology  design.  It  turns 
out  that  detection  of  network  partitioning 
can  only  occur  at  network  reconnection 
time. 

3.  inherent  communication  delay:  the  time  to 
get  a  message  through  a  computer 
communication  network  may  be  arbitrarily 
long,  although  finite.  Therefore  any 
proposed  solution  should  operate  correctly 
regardless  of  the  delay  experienced  by  any 
message,  and  in  general  should  be 
efficient . 


A  protocol  to  coordinate  concurrent  access  to 
a  distributed  database  using  looking  is  presented 
in  this  p-^per.  Th"  .nlzorittra  has  as  its  cor»  a 
p.,,,  lock '  n.?  ‘--.ith  di  i'-”‘-biited 

recovery  pr-ocedures.'  A  centralized  co::trclU-:-  with 
local  appendages  at  each  site  coordinates  all 
resource  control,  with  requests  initiated  by 
application  programs  at  any  site.  Recovery  is 
broken  down  into  three  disjoint  mechanisms;  for 
1  nr'd-  recav-.-y.  merg--  -if  n.-.; '  ‘  and 

reconstruction  of  the  cent ral i'zed  cont.-oii-.-:-  .mz 
tables. 

Among  the  properties  of  the  proposed  protocol 
we  have; 


S..  robustness  in  the  face  of  crashes  of  any- 
participating  site,  as  well  as 
communication  failures,  is  provided.  The 
protocol  can  recover  from  any  number  of 
failures  which  occur  either  during  normal 
operation  or  during  any  of  the  three 
recovery  processes. 


b.  fipadiock  prevent iQti  fin!  sz 

■  ethods  oan  be  easily  integrated  given  the 
centralized  control  characteristic  of  the 
proposed  algori  thro . 

£.  Straight fareard  intygrat-ian  ^ 

lockigg  BCthads  [  1 3  la  ggrnittad.  Valu 
dependent  lock  specification  at  the 
logical  level  is  necessary  to  avoid  the 
nroblems  of  "phantom  tuples"  discussed  by 
Lwaran  et  al  [1]-  Other  looking 
dlselpllnes  may  also  be  easily  supported. 

continued  Isadi  ■O.P?ratlQa  In  ihe 
network  nar  titionihg.  la  a.UE£Orijed-  The 
locking  algorithm  operates,  and  operates 
correctly,  vhen  the  network  is 
partitioned,  either  intentionally  or  ‘’V 
failure  —of  — ooBBunloatlon  lines.  Each 
partition  is  able  to  continue  with  work 
local  to  it,  and  operation  merges 
gracefully  when  the  partitions  are 

reconnected . 

e  nerformanee  Sil  the  aTporitto  inaa  ^ 
degrade  operations.  It  is  shown  in  this 
paper  that  for  many  topologies  of 
interest,  the  delay  Introduced  by  the 
protocol  is  not  a  direct  function  of  the 
size  of  the  network.  The  eocnunloatlon 
cost  is  shown  to  grow  in  a  relatively 
slow,  linear  fashion  with  the  number  of 
sites  in  the  network. 

X.  the  correct  ojjeratlon  oj:  the  Jrotp.c^  in 
the  face  of  the  failures  mentioned  before 
oan  be  proven  in  a  straightforward  way. 

Several  other  approaches  for  sjTiehronizatlon 
in  distributed  databases  have  been  suggested  in  the 
literature,  but  none  deal  satisfactorily  with  all 
of  these  Issues. 

The  Majority  Consensus  protocol  proposed  by 
•rhomas[8]  requires  the  sites  involved  in  a 

transaction  to  agree  by  majority  vote 
proceed.  Timestamps  on  data  items  at  each  site 
indicate  whether  the  item  is  current  and  therefore 
whether  a  transaction  based  on  it  can  be  sppro/ed. 

This  protocol  is  quite  elegant,  with 

attractive  behavior  in  the  face  of  failures, 

especially  for  fully  replicated 

Unfortunately,  for  the  eases  considered  in  this 
Sfper  ents  several  drawbacks.  The  locking 

b-.'  '  a-  ".at.;ro  o."  the  algor !t.a.Ti  l-a-'-.,  -it. -i-S 

r  ‘r  "S.5: 

transactions  which  eonfllet  lead  to  multlp  e 
resubmission  of  each. 

Synchronization  in  SDD-1  W  is 
several  different  protocols  designed  to  co-exist 
with  one  another.  The  simpler  ones  can  be 
certain  restricted  classes  of 
advance  of  system  generation. 
significant  Improvements  in  cost  and  delay 
more  general  protocols  results.  Otherwise  however, 


we  recommend  our  protocol  since  its  performance  is 
absolutely  better  and  Issues  such  as  robustness  and 
crash  recovery,  not  handled  by  SDD-1,  are 
■considered  fully . 

A  ring  structured  solution  is  proposed  by 
Ellls[6]  It  uses  sequential  propagation  of 

synchronization  and  update  messages  along  a 
statically  determined  circular  ordering  of  the 
nodes.  Two  round  trips  are  required  for  each 
update.  This  protocol,  while  in  general  much 
slower  than  the  others  mentioned  above,  quite 

simple  and  Ellis  has  employed  formal  verification 
proLdures  to  show  its  correctness.  Unfortunatelj 
however,  failures  and  error  recovery  are  not 
addressed  by  the  protocol. 

Other  proposed  schemes,  called  iinisari 
strategies  have  been  suggested  in  [3J.  153  ' 'J; 

Alsberg  in  [33  introduced  some  techniques  aimed  at 
JrSng  a  certain  degree  of  resiliency  o  he 
■Lngle  primary,  multiple  backup  strategies 
dlsLssed  in  [53  and  [73-  The  primary  copy  scheme 
is  primarily  designed  to  maintain  mutua 
consistency  of  databases  subject  to 
limited  types  of  update  operations,  but  it  does 
rot  addreL  explicitly  the  problem  of  internal 
eonslsteney  of  a  distributed  system  supporting 
general  transactions. 

The  protocol  presented  in  this  paper  is 
described  in  an  intuitive  manner  in  section  one, 
followed  by  a  more  detailed  doserlptlon  in  the  two 
Lbsequent  sections.  An  algor ithmie  speeifieatlon 
of  this  locking  protocol  oan  be  found  in  • 

informal  proof  of  the  correctness  of  the  algorltta 
is  presented  here.  The  proof  is  decomposed  into 
five  major  parts,  one  for  normal  operation,  three 
Jor  the  recovery  phases,  and  a  last  part  that  shows 
the  parts  actually  can  be  proved  dlsjointly. 

The  paper  concludes  with  a  proposal  for  an 
extension  aimed  at  optimizing 
algorithm  to  adapt  to  highly  skewed 
of  activity.  The  extension  applies  nleoly  to 
interoonneeted  computer  networks. 

1  -  rpntrallzed  Lock  C.pntroUer  £rotoeol 

Jntultive  Jlegorijii.'ifin 

r["P_o  w^.*  ni'i  .’i...  n..  ■■-  i- 

dlstributed  a.mone  n  nodes  of  a  eompute.^  network, 
numbered  from  1  to  n.  We  assume  that  the  network 
To^-ols  are  such  that  a  copy  of  a  message  is  kep 
by  its  sender  until  an  acknowledgment  for  it  is 
revived.  Ir,  to 

retransmitted  many  times  until  tiiey 

net  An  implication  of  these  assumptions  that 

res;agrs  may  be  delayed  by  an  arbitrary  but  finite 
omou^  of  time.  We  also  assume  that  messages  fro_. 

B  in  the  same  order  they  were  generated. 
we  make  no  assumptions  about  the  order 
messages  from  two  distinct  sources  are  received  b> 
r  third  one.  We  require  that  the  -twork  rout  ^ 
procedures  bo  such  that  every  pair  of 
communicate  with  each  other  if  the  necessary 
physical  connection  is  available. 

User  interaction  with  the  database  is  done 


it  t.e  ’Base"'  Aanas^ent 

ITolZseT  Of  ho=e  proce«ea.  two  are  of  i^tereat 
Zr  l"s  locking  protocol:  the  'centralize  locU 
controller'  or  simply  'lock  controller  and 
'local  lock  controller'. 

As  a  first  approximation  assume  that  there  is 
only  one  lock  cLtroller  or  LC  for  the  en  ire 
network  This  process  is  responsible  among  other 
”S.  for  .xJnlns  l»=k  .nd  lock  r.lcc.c  rcqoo.to 

rr»  l».  APc.  .»«  “Vc’^itklnc 

granted  or  not.  For  this  purpose,  the  LC  maintains 
a  table  called  the  LOCK  table,  ^ich  is  a  set  o 
all  the  active  locks.  Each  entry  in  this  table  is 
f  ItSL  of  the  form  (H,T,P)  virere  H  is  a  un  que 
iL-^tification,  T  is  a  unique  transaction 
?rentifier  within  each  site  and  P  is  a  ^escr  ption 
of  the  logical  portion  of  the  database  to  be  looked 
as  well  as  the  lock  mode  (e.g.,  read,  write, 

In  a  relational  database,  the  lock 

iay  for  example  be  a  predicate  lock  as  described  by 

Eswaran  et  al  [ 1 ]  • 

At  every  site,  except  for  the  one  ''here  the  LC 
is  located,  there  is  a  local  look 
nr  Those  processes  are  responsible  for 
S^taini  a  ?ocal  copy  of  the  LOCK  table.  Any 
Lc  "ay  the  lock  controller  whenever  there 

is  a  crash  in  the  system  which  makes  the  L 
un=v:>ilable.  The  recovery  process  is 
later  In  detail.  Each  time  a  transaction  takes  an 

loci  copy  of  t»«  ‘%r”  "S 

to  aoter^loe  wMlkor  th.  .otlon  oan  be  P'ffo"'"  " 
not  Therefore,  there  are  two  reasons  for  keep-^ 
a  local  copy  of  the  LOCK  table,  namely:  resilience 
to  failures  and  local  action  checking. 

It  is  convenient  at  this  point  to  introduce 
the  notion  of  logical  partition  or  logical 
component,  as  opposed  to  that  of  a  ^ 
component.  A  physiol  posponent  is  a  maxim  1 
subnetwork  such  that  every  pair  of  sites  in 
component  can  communicate  with  one 
be  readily  seen  that  the  composition  of  a  Physical 
minen'  is  not  under  the  control  of  the  lockl^ 

protocol,  since  nodes  and  ^ 

independently  of  the  protocol  opc'-^tion.  Such  a 
laf-k  of  control  could  make  the  operation  of  t  - 
p^ol^ocol!  in  the  face  of  crashes,  rather  complex 
Th?  concept  of  logical  cc-mponent  is  introduced 

y,  -•pr*''d  ^*^0“  i'r'Din  urib'Xip'^c  t-sd 

£ngerJ  the- 'competition'  of  each  physical 

component.  To  this  end ,  each  LC  keeps  a  list  of 

surs  whlth  he  thinks  are  still  up,  called  the_  up 
net  A  lozical  comDonegt  is  defined  as  be  ng 
“?netw.%^ted  by  the  nodes^vhich  are  in  ^^the 

’-'■t- ‘which  tre  actually  up.  Independence  from  the 
composition  of  physical  components  is 
by  controlling  the  way  by  which  the  ^er  list 
maps  into  the  former,  in  a  way  which  is  explained 

later  in  the  paper. 

Since  one  of  our  stated  goals  is  to  «llow 
local  operations  to  continue  in  face  of  network 
partitioning  and  to  allow  t°  “erg^ 

Braf'efully,  it  is  necessary  for  each  P®[tition 
Lve  its  own  LC.  There  is  one  LC  for  each  logica 
coeponent . 


of  the  locking  protocol  under  no 


-crash  conditions  can  be  intuitively  explain^  as 
follows.  The  LC  receives  lock  and  lock  release 
, requests  from  the  application  programs.  E^h 
•request  is  sent  to  all  LLCs  In  the  component  T^e 
request  is  stored  in  a  pending  list  at  ea  h  LLC 
Bite  and  an  acknowledgment  is  sent  back  to  the  LC. 
After  the  acknowledgment  from  all  sites  i"  the 
Jl^ponent  is  received  (excluding  those  wh  ch 
crashed  in  the  meantime)  a  confirmation  for  the 
request  is  sent  by  the  LC  to  all  LLCs  causing 
request  to  be  deleted  from  the  pending  list  and 
appended  to  the  LOCK  table, 

A  lock  request  may  be  rejected  by  a  LC  If  it 
conflicts  with  other  locks  In  the  LOCK  ta.ile  or  If 
the  request  Is  not  local  to  the  component.  We 

assume  that  the  LC  Is  able  to  determine  for  each 
look  P,  the  set,  LOC(P),  of  sites  where  the  data 
to  be  locked  are  stored.  Thus,  a  lock  P  i= 
be  local  if  LOC(P)  is  contained  in  up  list  f 
•the  component.  The  set  LOC(P)  can  be  determ  ned  by 
the  LC  by  checking  some  catalogs.  The  organization 
of  those  catalogs  is  not  relevant  here;  see  [9]  and 
[10]  for  discussions  of  that  subject. 

Every  time  that  a  site  or  a  set  of  sites  drop 
out  of  the  up  list,  all  the  locks  are  not 

local  any  more  are  released  and  all  the 

transactlLs  which  had  at  least  one  lock  released 
will  be  aborted  or  backed  up.  In  this  way 
locality  of  operations  Is  enforced  by  the  CLC 

protocol . 

If  the  LC  crashes  or  becomes  unavailable  a 
recovery  mechanism  called 

Becoverv  (L£R)  takes  place.  As  soon  as  LC-cra  . 

Ts - detected  by  any  process  engaged  in  a 

conversation  or  exchange  of  S'be  th^ 

controller,  a  new  process  is  nominated  to  be  the 

new  LC.  There  is  a  globally  known 
orLring  of  the  sites  from  which  the 
selected.  If  the  nominee  is  up  It  accepts 

ToiLtion  by  sending  a  message  which  circulates 

through  all  the  sites  in  the 

purpose  of  this  message  is  ^  sites 

requests  which  have  been  received  bV  the  sites 

but  which  are  still  in  the  pendi^  be 

least  one  site.  Those  requests  will  be 

iLorporated  into  the  LOCK  table  at  every  site  in  a 
subsequent  phase  of  the  recovery  P^°=”=: 
summary,  th»  LCR  n-chanlsm^ amounts  to^ ^ 
n 'W  I Cw'n^on'-*nt  brJOvii’-z  r.wj. 

t'ablc;  to'  the  same'  value  hafore 

resumed.  Various  race  conditions  are  dealt  with  by 
the  details  of  the  recovery  protocol. 

It  is  the  responsibility  of  c.ach  LC^  to 
p.-loclically  n-Anitor  Ih- 

Lnneition"'  bltweei"  two  ''peviously"  loyally 

aLays  done  pairwise  between  components  and  in  thl 
process  the  LC  of  one  of  the  components  plays  an 
Ltlve  role  while  the  other  plays  a  passive  one. 
The  first  phase  of  LCM  is  composed  of  an 
ISterconnectlon  protocol  by  which  LCs  ate 

logically  connected  in  such  a  way  that  one  of  ^em 
la  designated  active  and  the  other  passi  c. 
protocol  also  enforces  bbepairwise  merge  condition 
and  is  shown  to  be  deadlock  free.  After  a  logical 
connection  has  been  established  both  LCs  clear  al 


The  operation 


cnjtstanding  requests  and  reject  further  ones.  In 
the  subsequent  phase,  the  union  of  the  LOCK  tables 
of  the  two  coraponents  is  made  and  the  new  LOCK 
table  is  sent  to  all  the  sites  in  both  eonponents 
in  the  fonn  of  a  message  which  circulates  through 
them.  This  message  signals  the  completion  of  the 
merge.  The  active  LC  becomes  the  lock  controller 
for  the  new  logical  component. 

When  a  site  which  was  down  recovers,  it  is 
made  active  by  the  gin.gle  Node  feopyerj^  (SHE) 
mechanism  which  basically  amounts  to  the 
acquisition  by  that  site  of  a  new  copy  of  the  LOCK 
table . 

The  three  recovery  mechanisms  de.scribed  above 
dc  not  interact  with  each  other,  as  will  be  shown 
later.  This  property  is  important  because  it 
allows  us  to  decompose  the  correctness  proofs  into 
a  proof  of  disjointness  and  then  proofs  for  each 
recovery  procedure  separately. 

The  recovery  mechanisms  will  be  shown  to  be 
robust  in  the  face  of  additional  failures.  In 
order  to  achieve  this  goal,  each  mechanism  is 
designed  in  such  a  way  that  a  partial  execution  of 
any  of  the  recovery  algorithms  does  not  destroy  any 
of  the  properties  we  want  to  prove  about  them. 

It  is  important  to  emphasize  at  this  point 
that,  since  all  the  lock  requests  arc  examined  by 
an  LC  in  each  logical  component,  locks  granted  by 
LCs  do  not  conflict  with  one  another.  This  fact 
enables  us  to  consider  the  operation  of  the 
algorithm  for  normal  operation  and  for  recovery  as 
if  there  were  only  one  lock  per  logical  component. 
Tns  reader  is  encouraged  to  keep  this  in  mind  as  he 
reads  through  this  paper. 


1  -  Lock  and  Release  Granting  Algorithms 

This  section  describes  informally  the 
algorithms  used  to  grant  new  looks  and  to  release 
existing  ones.  One  would  like  those  algorithms  to 
have  the  property  that  a  lock  is  either  granted  or 
released  if  and  only  if  it  is  known  to  all  the 
sites.  The  basic  structure  of  both  algorithms  can 
be  abstracted  in  what  we  call  the  Assured 
Com-munication  Protocol  ( ACP)  which  exhibits  the 
desired  property  outlined  belov. 

Let  there  be  a  send-T  S,  who  wishes  to  send  a 
message  K,  originated  at  an  external  source  ES,  to 
destinations  D1,  D2,  ...,  On.  Each  site  i  keeps 
two  message  buffers:  temp_buffer(i)  and 

final  bufrer(i).  ACP  is  nuch  that  message  K  will 
or'.-,"':.-  in  final,.'  i"!'  -  is  nlth-r  in 

ter;  t'-.'.'W'.riiix)  or  f ir.;il_buf fcrCDi)  for  all 

destinations  Di .  ACP  can  be  described  by  the 
following  set  of  rules: 

1.  S  receives  a  "MESSAGE  REQUEST"  or  MR 
message  from  ES  and  broadcasts  an  "ACCEPT 
MESSAGE"  or  AM  message,  which  contains  M, 
to  all  Di’s,  i=1 , . . .  ,n.  The  message  M  is 
placed  in  temp_buffer(S) . 

2.  When  an  AM  message  is  received  by  a 
destination  Di ,  the  message  H  is  placed  in 
tcmp_buffer(Di)  and  a  "MESSAGE  ACCEPTED" 
or  MA  mes.sage  is  sent  back  to  S. 


3.  When  all  the  HA  messages  have  been 

received  by  S,  M  is  moved  to 
final_buffer(S)  and  removed  from 
temp_buffer(S)  and  a  "CONFIRM  MESSAGE"  or 
CM  message  is  broadcast  to  all 

destinations . 

1).  The  receipt  of  a  CM  message  at  destination 
Di  causes  M  to  be  moved  into 

final_buffer(Di)  and  removed  from 
temp_buffer(Di) . 

A  variant  of  this  approach  with  additional 
acknowledgment  messages,  called  a  two-phase  commit 
protocol,  is  described  in  [11]  and  [12]. 

Several  other  details  are  also  worth  keeping 
in  mind.  As  mentioned  before,  each  LC  keeps  a  list 
of  the  sites  in  the  component  which  are  up.  A  node 
i  is  removed  from  this  list  by  the  LC  each  time 
that  the  underlying  network  protocols  fail  to 
deliver  a  message  to  site  i  (after  timeout  and 
retransmission  occurred  a  certain  number  of  times)  . 
An  up  list  is  also  modified  by  the  execution  of  any 
of  the  three  recovery  mechanisms.  A  copy  of  the  up 
list  is  also  kept  by  each  LLC.  Every  update  to  the 
up  list  by  the  LC  is  transmitted  to  all  LLCs  in  the 
component.  Note  that  no  additional  message  traffic 
is  generated  by  those  updates  since  they  can 
"piggyback"  on  other  messages.  The  reason  for 
keeping  local  copies  of  the  up  list  is  merely  a 
matter  of  performance,  since  the  up  list  determines 
to  some  extent  the  set  of  nodes  which  should 
participate  in  the  LCR  or  LCM  recovery  mechanisms, 
as  will  be  seen  later.  Also,  every  time  that  a 
change  in  the  up  list  causes  certain  locks  not  to 
be  local  any  more,  all  non-local  locks  are  released 
and  the  affected  transactions  aborted. 


2..J.  -  Lock  Granting  Algorithm 

Application  programs  issue  lock  requests  by 
sending  a  "LOCK  REQUEST"  or  LR  message  to  the  LC. 
This  message  contains  the  lock  or  3-Luple  which  the 
user  would  like  to  be  entered  in  the  LOCK  table. 
The  LC  ddeides  whether  the  lock  can  be  granted  or 
not.  If  the  requested  look  conflicts  with  other 
active  looks  a  scheduling  decision  must  be  taken  by 
the  LC  as  to  whether  to  pr’ceraot  any  tr.snsaotion  or 

fn  tiv’  |■•-;q'•.‘:Jtor  \.M’;  t  .  That  d  ec  i -i  i  J  r.  is  no',-, 

the  concern  o."  thi.s  paper.  If  there  are  n.'J) 

conflicts  and  the  look  is  local  to  the  component 

the  LC  must  notify  every  LLC  in  its  component  that 
a  new  entry  should  be  appended  to  their  LOCK 
tables.  Actually,  instead  of  inserting  the  look 
(.'.ir--tlv  ■‘nto  the  LOCK  t'b'o,  .on  LLC  aooerd.s  f-,  to 
a  list  of  p"ndi.ng  lock  ('i,‘qu'’.‘>ts ,  called  an 
The  reason  for  this  is  to  prevent  copies  of  the 
LOCK  table  from  becoming  inconsistent  if  the  LC 
crashes . 

The  basic  structure  of  the  Lock  Granting  and 
Lock  Releasing  algorithms  is  the  same  as  that  of 
the  ACP  protocol,  where  AP,  LC,  LLCi  and  LOCK  table 
correspond  to  ES,  S,  Di  and  final_buffer  in  ACP, 
respectively.  Also,  the  message  M  in  ACP  should  be 
considered  as  a  look  request  for  the  Lock  Granting 
algorithm  and  as  a  release  request  for  the  Lock 
Releasing  one.  For  the  Lock  Granting  Algorithm,  in 
particular,  temp_buffor  corresponds  to  an  L-list. 


2-2  -  Xock  Releasing  Algoritha 

A  similar  procedure  is  followed  when  an  AP 
issues  a  look  release  request,  by  sending  to  the  LC 
a  "RELEASE  REQUEST"  or  RL  message.  Each  site  keeps 
a  list  of  pending  release  requests  or  an  £-liiit  for 
the  same  reasons  we  introduced  the  L-list.  The  R- 
list  corresponds  to  temp_buffcr  in  the  ACP 
protocol . 


Z-1  -  Some  , 


and  Proofs 


We  will  show  here  that,  if  no  crash  occurs, 
the  Look  Granting  and  Look  Releasing  algorithms 
have  the  property  that  a  look  is  only  granted  or 
released  if  all  the  sites  in  the  component  know 
about  the  request.  In  order  to  make  this  statement 
more  precise  consider  the  following  definitions. 
Let  LT(i),  L(i)  and  R(i)  be  the  LOCK  table,  L-list 
and  R-list  at  site  i  respectively. 


J.  (Lock  Recuest  Presence) :  A  look 
request  or  a  look  is  said  to  be  present  at  site  d 
i,  if  the  look  is  either  in  LT(i)  or  if  it  is  in 
L(i). 


changes  have  been  made  to  any  permanent  information 
(like  LOCK  tables,  up  lists  or  LC  id’s)  at  any 
node. 


1  (l.T-conslstencv) :  The  set  of  LOCK 
tables  of  a  Logical  Component  is  said  to  be  LT- 
consistent  if  assertions  1  and  2  hold  at  any  time. 


DEFINITION  1  (Logical  Component  Internal 
Consistency) :  A  logical  component  is  said  to  be 
internally  consistent  if  the  set  of  its  LOCK  tables 
is  LT-consistent  and  if  there  is  one  and  on]y  one 
LC,  whose  identity  is  known  to  every  node  in  the 
component . 


DEFINITION  (Logical  Component  Mutual 

Consistency) :  A  set  of  logical  components  is  said 
to  be  mutually  consistent  if  all  of  them  are 

internally  consistent  and  if  there  is  no  lock 
present  at  any  LOCK  table  of  one  of  them  which 
conflicts  with  another  such  lock  of  any  other 

component . 

Definition  5  covers  the  previous  two,  and 

specifies  the  property  which  is  required  of 
recovery . 


DEFINITION  2  (Release  Renuest  Presence) :  A  lock 
release  request  is  said  to  be  present  at  site  t  i 
if  it  is  either  in  R(i)  or  if  is  not  in  LT(i). 

The  proof  for  the  following  two  assertions,  as 
well  as  for  all  other  assertions  in  this  paper,  can 
be  found  in  [2]. 

ARSEFTION  1;  If  a  lock  is  in  LT(1)  for  some 
i=1,...,n  and  in  the  L-list  for  at  least  one  site, 
then  this  look  is  present  in  every  other  site  of 
the  component , 

A.R.SEFTION  2:  Let  X  be  a  look  and  y  its  associated 
release  request.  If  x  is  in  LT(i)  for  at  least  one 
site  in  a  logical  component  but  not  in  all  of  them 
and  y  is  in  at  least  one  R-llst,  then  y  is  present 
in  every  other  site. 

Assertions  1  and  2  together  lead  directly  to 
the  following  result. 

THEOREM  1:  Let  C  be  a  logical  component,  LC  its 
lock  oont.'’oller  and  tl  the  set  of  .site.?  In  C.  If  no 
cr''.sbe,i  aver  occur  then  a  l''ck  r.^quest  is  only 
granted  by  t  ie  LC  after  it  is  present  .it  all  the 
sites  in  U  and  a  look  is  only  released  if  the 
associated  release  request  is  present  at  every  site 
in  U. 


1  - 

So  far  we  have  described  the  protocol  for 
requesting  looks  and  releasing  them,  assuming  that 
no  crash  occurred.  Communication  links, 
processors,  operating  systems  and  processes  arc 
some  examples  of  sources  of  crashes. 

The  three  already  mentioned  recovery 
mechanisms  will  be  presented  here.  These 
mechanisms  will  be  proven  to  be  robust  with  respect 
to  additional  failures.  To  be  robust,  the 
protocols  must  preserve  logical  component  internal 
and  mutual  consistency  as  defined  below,  if  any 


The  recovery  protocols  have  been  designed  so 
that  all  crashes  which  can  occur  during  a  recovery 
phase  fall  into  one  of  the  two  disjoint  classes, 
which  we  call  terminal  and  transparent  failures. 

A  terminal  crash  causes  the  entire  recovery 
mechanism  to  be  aborted  and  restarted.  The 
possible  conditions  under  which  terminal  crashes 
occur  are  shown  to  leave  the  protocol  in  a  robust 
state,  as  defined  above.  A  transparent  crash  is 
defined  to  be  one  which  does  not  affect  the 
continued  correct  operation  of  the  recovery 
process . 

Therefore,  if  all  crashes  can  be  shown  to  be 
either  terminal  or  transparent,  the  recovery 
protocols  are  robust.  As  we  will  see,  for  each  of 
the  recovery  mechanisms,  we  can  identify  a  point 
before  which  the  recovery  o?,;i  be  considered  as  not 
having  happened  at  all  and  after  which  it  is 
considered  to  be  successfully  carried  out.  This 
point  is  called  the  'completion  point'.  Crashes 
before  the  completion  point,  if  they  have  any 
effect  at  all,  are  shown  to  be  terminal.  Crashes 
ar'r.'T  the  arv.pi  .Man  point  .;ra  s'vj  .-i  to  be 
tr.xnaparvnt . 

The  three  proposed  recovery  mechanisms  will  be 
shown  to  occur  disjointly  in  time.  In  other  words, 
a  merge  of  two  logical  components  only  takes  place 
if  b.nth  .ire  Ir.  their  r.pr’r.al  state  or  ere  net 
rrcoveriri^  .'.-'m  .a  !,ogic.\l  Campon-nt  'Jrauh.  .Alsp,  • 
site  only  becomes  attached  to  a  logical  component 
if  this  component  is  in  its  normal  state.  These 
Important  properties  will  allow  us  to  state  and 
prove  separate  theorems  concerning  each  one  of 
them. 


2.1  -  Logical  Component  Recovery  (L£E) 

We  will  now  show  how  an  LLC  may  become  an  LC 
if  the  LC  crashes.  A  crash  of  the  LC  can  be 
detected  by  any  process  engaged  in  a  conversation 
or  exchange  of  messages  with  it.  As  an  example,  an 


JIP  may  tiin»-out  *hile  vailing  Tor  a  reply  froo  the 
LC  for  a  lock  or  lock  release  request.  In  every 
case,  the  process  which  delects  s  crashed 
responsible  for  noainating  a  new  LC.  For  this 
purpose,  we  will  assume  that  the  distinct  sites  or 
nodes  in  the  underlying  network  are  arranged  in  a 
linear  order  such  that  node  #i  precedes  node 
#{i+1)  nod  n.  Let  this  order  be  called  the 
nor^inatlon  order.  So,  whenever  a  process  detects  a 
failed  LC  it  nominates  the  next  node  which  is  up  in 
the  noninatlon  order  to  the  position  of  LC.  ^lis 
nomination  is  accomplished  by  the  issue  of  an 
■ACCEPT  NOMINATION"  or  AN  message  by  the  nominator. 
If  this  message  is  not  acknowledged  after  a  certain 
number  of  times  it  has  been  retransmitted,  the 
nominator  assumes  that  the  nominee  is  down  and 
sends  an  AN  message  to  the  next  site  in  the 
nomination  order.  However,  it  may  be  the  case  that 
the  originally  nominated  node  was  not  down,  as 
assumed  by  the  nominator,  but  that  due  to  certain 
conditions  in  the  network  its  reply  was  seriously 
delayed.  So,  it  seems  that  more  than  one  LC  could 
be  nominated  in  this  process!  Let  us  neglect  this 
issue  for  the  moment,  while  we  describe  the 
recovery  procedure,  and  show  later  how  such  an 
undesirable  situation  can  be  easily  avoided.  The 
nominee  is  first  responsible  for  checking  that  the 
old  LC  is  actually  dead  (since  the  nomination  may 
have  come  from  an  errant  AP).  Than  the  nominee 
must  notify  every  other  site  that  it  has  accepted 
the  nomination.  Moreover,  the  nominee  must  make 
sure  that  all  the  copies  of  the  LOCK  table  be  made 
equal  to  the  one  held  by  the  crashed  LC.  From  now 
on,  we  will  refer  to  the  crashed  LC  as  the  'old  LC 
and  to  the  nominee  as  the  'new  LC. 


The  process  by  which  the  new  LC  becomes  the 
actual  LC  can  be  divided  into  two  phases:  a 
'notification  phase'  and  a  'LOCK  table  update 
phase' . 

In  the  notification  phase  all  the  nodes  in  the 
component,  as  indicated  by  the  up  list  U,  are 
informed  of  the  identity  of  the  new  LC.  Also,  in 
this  phase  enough  information  is  gathered  in  order 
to  appropriately  update  the  LOCK  tables  in  the 
subsequent  phase.  The  necessary  infennation  is 
described  by  the  sets  L  and  P  as  defined  below. 

hPFTNTTION  i  Cast  L  -  iSi  sf  losks  10  ie  le 

1.  =  (  X  1  X  is  in  L(i)  for  some  i  in  C  and 
X  is  present  in  all  sites  in  U  ) 


The  new  LC,  upon  nomination,  will  issue  a 
message  called  "NOMINATION  ACCEPTED".  This  message 
will  circulate  once  through  the  set  of  all  sites  in 
U  (including  the  site  where  the  new  LC  runs)  in  a 
predetermined  order. 

In  order  for  the  set  L  to  be  constructed,  two 
sets,  LI  and  L2,  are  formed  during  the  NA  cycle. 
LI  is  the  set  of  locks  which  are  present  at  all 
sites,  while  L2  is  the  .set  of  locks  wiilch  are  in 
all  the  LOCK  tables.  By  definition  6,  the  set  L  is 
the  difference  between  LI  and  L2. 


The  set  P  is  also  made  out  of  two  sets  PI  and 
R2.  PI  is  the  set  of  lock  release  reque.sts  which 
are  not  present  in  at  least  one  site,  and  R2  Is  the 
set  of  lock  release  requests  in  the  P-list  of  at 
least  one  site.  The  difference  P2  -  PI  is  the  .set 
of  locks  which  sre  present  at  every  site,  which  by 
definition  7  is  the  set  P. 

Every  node,  other  than  the  newLC,  in  the  NA 
cycle  receives  partially  constructed  sets  LI,  L2, 
El  and  P2,  adds  its  eontrlbut ions  to  them  and 
places  the  new  versions  of  the  sets  into  the  NA 
message  which  is  forwarded  to  the  next  node  in  the 
cycle.  When  the  NA  message  returns  to  the  newLC, 
the  sets  L  and  P  are  completed.  Also,  the  up  list 
U  for  the  new  LC  will  be  initialized  with  the  sites 
which  participated  in  the  above  described  eycle. 

After  the  notification  phase  is  over,  the  new 
LC  will  send  a  message  to  every  LLC  asking  them  to 
update  their  LOCK  tables.  This  message  is  called 
an  "UPDATE  TABLE"  or  UT  message,  and  it  carries 
within  it  the  sets  L  and  R. 


Having  updated  the  LOCK  table,  each  LLC  sends 
a  "TABLE  UPDATED"  message  or  TU  message  to  the  new 
LC.  After  receiving  a  TU  from  every  up  site  the 
new  LC  becomes  the  actual  LC  by  notifying  all  the 
LLCs  that  they  can  resume  their  normal  activity. 
For  this  purpose  the  LC  broadcasts  a  "PESUMs  NORMAL 
ACTIVITT"  or  PNA  message.  The  new  value  for  U  is 
the  set  of  sites  from  which  the  LC  received  a  TU 
message.  This  new  value  for  U  is  included  in  the 
ENA  message,  thus  allowing  every  node  in  U  to  know 


AAWirtrtol  f  i  nn 


fViO  Qot  II. 


Let  us  now  describe  how  we  can  guarantee,  and 
in  effect,  prove  the',  only  one  I.C  will  .-.o-r’e  fron 
notific  ion  .  reco.’. 

n"’-mir.ator  will  ncmi.nate  the  up  ito--  i*. 

nomination  sequence.  Let  us  make  the  following 
definition: 


So,  a  lock  X  is  in  L  if  it  is  present  at  every 
site  but  it  is  in  nt  least 
i.r, that  .'11  the  sl'.e.e  -■•c-ived  "t 

I  jZ;'.  '  L..Msa?e  fre.n  the  old  LC,  but  ,nt  iea.it  o.-.e  i. ’. .. 
not  receive  a  "CONFIRM  LOCK"  message. 


prFlNITION  1  (££l  fi  -  £et  ifiSks  i£  M  dslfitel 
from  ili  Lli) : 

E  Z  (  X  I  X  is  in  P(i)  for  some  i  in  U  and 
X  is  present  in  all  sites  in  U  ) 

So,  if  a  release  request  is  in  P,  then  oil 
sites  in  U  have  already  received  an  ACCEPT 
RELEASE"  message  from  the  old  LC. 


8  (trial  scjine  rvee .  TT^.k]':  A.  tri"'! 

7r  ia  rh"  t;  '  1  'L?!,  •  •  •  _. 

i[;-.-lj  of  ait-  niTib  -.".!  :'o."  wr.icn  an  ' 

NOMINATION"  message  has  been  unsuccessfully  sent  by 
a  nominator  j,  before  J  sent  an  AN  message  to  site 

Ik. 


For  every  AN  message  sent  from  site  fj  to  site 
we  include  the  sequence  T[J,k]  as  part  of  it. 
is  sequence  will  also  be  included  as  part  of  the 
OMINATION  ACCEPTED"  message  which  circulates 
, rough  the  set  of  sites.  The  purpose  of  this  is 
I  allow  any  site  to  resolve  any  conflict  that  can 
•ise  due  to  the  race  conditions  discussed  earlier 
1  the  paper.  Namely,  it  is  possible  that  more 
lan  one  LC  was  nominated  and  consequently  more 


than  one  NA  nesaage  (from  distinct  Bouroee)  vould 
be  clrculatlns.  Conflicts  are  resolved  by  giving 
preference  to  the  last  LC  to  be  nominated.  HA 
messages  originated  by  other  nominated  LCs  arc 
killed  when  they  are  detected  to  belong  to  the 
Improper  LC. 


In  many  Instances,  In  the  CLC  protocol,  ve 
require  a  certain  message  to  circulate  through  a 
set  of  nodes,  as  It  is  the  case  of  the  NA  message. 
Let  us  call  such  messages  'circular  messages*. 
They  always  have  a  source  or  generator  who  is 
resp-r.siblr  for  sending  it  through  a  cycle.  The 
underlyirg  network  protocols  assure  us  that 
messages  will  not  get  lost  while  going  from  one 
site  to  another  by  the  use  of  time-out  and 
retransmission  sohsmes.  However,  a  circular 
messaoe  can  still  be  lost  if  a  node  in  the  cycle 
crashes  after  receiving  it  but  before  being  able  to 


forward  it.  The  loss  of  a  circular  message  can  be 
prevented  by  having  each  node  in  the  cycle  send  to 
the  circular  message  generator  a  copy  of  it,  but 
only  after  it  was  forwarded  to  the  next  node  in  the 
sequence.  Now,  the  source  is  able  to  detect  a 
cycle  interruption  and  it  can  appropriately  resume 
it  by  sending  the  last  copy  of  the  message  to  the 
appropriate  site.  This  source  acknowledgment 
scheme  at  the  CLC  protocol  level  will  be  assumed 
to  exist  whenever  a  circular  message  is  necessary. 


It  should  be  noted  that  if  an  application 
procram  issues  a  look  or  release  request  and  the  LC 
fails  be.'‘ore  the  request  is  pre.'ent  at  every  site, 
the  request  will  never  appear  in  the  local  LOCK 
table  even  after  the  LCR  is  completed.  Therefore, 
APs  should  timeout  for  requests  and  resubmit  them. 


3.1  -  proofs  About  LiS 

Ve  would  like  to  prove  now  that  the 
notification  phase  ends  with  one  and  only  one  LC 
having  been  successfully  nominated,  and  that  all 
sites  know  the  correct  new  LC  identification.  As  a 
first  step  we  state  assertions  3  ahd  4  which  arc 
concerned  with  the  behavior  of  LCR  given  that  no 
additional  crashes  occur. 

Ai^.sgPTIOH  2:  Given  that  no  additional  crashes  occur 
dv-irc  LCR,  there  will  be  one  and  only  one  LC  whose 
!  'Ifi  ■.‘lion  is  to  :’ll  sl'-es  in  th^ 

-lint  th“  tnd  of  t.-.“  r.otir'; ch t  Ion  p'nne-'. 

The  proof  for  this  assertion  is  based  on  the 
operation  of  the  trial  sequence  mechanism  described 
above . 


request  be  one  which  is  in  all  L-llsts  (R-list.s)  oi 
a  logical  component. 

AiSSRPTION  *1 :  Given  that  no  additional  crash  occurs, 
the  following  is  true  at  the  end  of  the  LCR 
mechanism.  All  the  copies  of  the  LOCK  table  for  a 
logical  component  are  identical  to  the  value  that 
the  LOCK  table  of  the  crashed  LC  would  have  if  all 
the  globally  accepted  requests  were  allowed  to 
complete  before  the  crash  of  the  LC. 

The  proof  for  this  assertion  considers  a 
snapshot  of  all  LOCK  tables  when  a  crash  occurs. 
It  is  first  assumed  that  there  are  no  globally 


accepted  requests.  In  this  ense ,  the  union  of  the 
LOCK  table  of  the  crashed  LC,  LT(oldLC),  with  the 
LOCK  table  of  a  given  site  i,  LT(i),  is  considered. 

It  can  be  shown  that  all  the  locks  In  LT(oldLC) 
but  not  in  LT(i)  will  be  Included  in  LT(i)  by  LCR. 
Also,  all  the  locks  in  LT(1)  but  not  in  LT(oldLC) 
are  removed  from  LT(1)  by  LCR.  Finally,  all  the 
locks  in  LT(i)  and  LT(oldLC)  are  not  affected  by 
the  LCR  mechanism.  If  there  are  globally  accepted 
requests  they  will  be  Included  in  the  sets  L  and  R 
by  definition  of  these  sets.  Therefore,  the  LOCK 
table  of  all  the  sites  in  the  component  will  be 
updated  in  exactly  the  sane  way  that  LT(oldLC) 
would  have  been  if  all  globally  accepted  requests 
had  completed.  Given  these  assertions  we  prove 
the  robustness  of  the  LCR  mechanism. 

THEOREM  2:  The  Logical  Component  Recovery  (LCR) 
algorithm  is  robust. 

Proof:  The  completion  point  for  this  algorithm 
occurs  when  the  LC  has  already  sent  all  the  RNA 
messages.  The  only  terminal  crash  is  a  newLC 
failure  before  this  point.  This  crash  when 
detected  will  cause  another  LC  to  be  nominated 
and  the  LCR  mechanism  to  be  restarted.  This 
crash  can  occur  at  three  different  points; 

i)  before  any  LOCK  table  has  been  updated. 

ii)  after  some  but  not  all  LOCK  tables  have 

been  updated . 

iii)  after  all  LOCK  tables  have  been  updated. 

In  case  i)  it  is  clear  that  the  partially 
executed  LCR  has  no  effect  at  all.  In  case  iii) 
all  LOCK  tables  will  bo  identical,  therefore 
internal  consistency  for  the  component  in 
question  is  trivially  satisfied.  Case  ii) 
requires  us  to  show  that  the  set  of  LOCK  tables 
of  a  component  is  LT-consistent.  We  enunciate 
and  prove  this  statement  as  the  following  Icrra. 

LEMMA  1:  Given  a  logical  component  where  the  set 
of  LOCK  tables  is  LT-consistent,  then  the  update 
of  the  LOCK  table  as  indicated  by  the  sets  L  and 
F  in  some  but  not  all  of  the  nodes  of  the 
component  preserves  LT-consistency . 

Proof:  Let  i  be  a  site  for  which  the  LOCK 
table  has  been  updated.  The  LOCK  table  is 
updated  in  two  steps.  In  the  first  one,  all 
the  loo’c.-*  in  the  .'^et  1,  a-e  add-ri  to  I.T(i)  . 
Ahd i t. Lon  o!'  a  lock  x  at  .'o.  ^  r* bvt  rot  i  : 
all  does  not  violate  asse.^tion  1  si.no-,  x  is, 
by  assumption,  a  member'  of  the  set  L  and 
therefore  is  present  at  every  site.  The 
second  step  is  the  removal  from  LT(i)  of  all 
the  locks  in  the  set  R.  Removal  of  a  lock 
fro'i  e  Lu'CK  t.nbl-'  n  eive.n  rh  sMll  -I’k-j 
ii  p.-eaent  at  cnis  sice.  .hiivee,  by 
assumption,  the  LOCK  tabic  has  not  yet  been 
updated  at  all  sites,  the  locks  removed  from 
LT(i)  are  in  the  LOCK  table  of  at  least  one 
site  and  are  present  at  all  sites.  Thus, 
assertion  2  is  also  valid  and  the  proof  is 
complete . 

Now,  it  remains  for  us  to  analyze  the 
transparent  failures.  Those  are  all  the 
failures  other  than  the  newLC  crash  already 
discussed.  We  can  have  either  a  process  or 
processor  failure  which  simply  knocks  out  one  of 
the  sites  in  the  component,  or  the  component  can 


be  partitioned  into  two  or  none  components.  In 
either  case,  a  set  of  one  or  more  nodes  are 
isolated  from  the  set  of  nodes  which  participate 
in  the  LCR  mechanism.  The  nodes  in  this  set 
will  not  be  considered  any  more  for  the  rest  of 
the  LCR  algorithm.  However,  we  have  to  show 
that  no  inconsistencies  are  generated  by  a  node 
dropping,  out  during  the  execution  of  LCR. 

For  this  purpose,  we  will  examine  all  the 
possible  instants  at  which  a  node  j  may  crash. 

CASE  durirg.  the  *  nomination  phase 

Here  we  have  to  show  that  the  sets  L  and  R 
will  not  be  perturbed  by  any  contributions 
already  made  to  them  by  node  j.  Node  j  can 
crash  at  three  possible  instants. 

CASE  1.1:  before  the  NA  message  first  reaches 

In  this  case  node  J  is  simply  removed  from 
the  cycle  without  contributing  to  the  formation 
of  either  L  or  R. 

CASE  1.2:  after  the  NA  message  reaches  it  and 
before  it  is  forwarded  to  the  next  node  in  the 
S0^u6ncs • 

Here,  the  node  which  sent  the  NA  message  to 
node  J  will  timeout,  detect  its  crash  and  send 
the  NA  message  to  the  node  which  follows  node  j 
ir.  the  sequence.  Again  no  contributions  have 
been  made  to  the  sets  L  or  R. 


CASE  1.3:  after  the  NA  message  has  been 

forwarded  ,  ^  , 

A  crash  of  node  J  at  this  Point  is 
equivalent  to  a  crash  of  a  node  during  the  LOCK 
table  update'  phase  since  node  J  already  played 
its  role  in  the  'notification  phase  . 
Therefore,  this  case  reduces  to  the  next  one  to 
be  examined.  The  reader  should  notice  that  the 
robustness  of  this  recovery  mechanism  I'elies 
heavily  on  the  fact  that  elements  arc  only  added 
to  the  sets  L  or  R  if  the  appropriate  requests 
are  present  at  all  sites  (intersection  approach) 
as  opposed  to  considering  requests  which  are 
present  in  at  least  one  site  (union  approach). 


CASE  2:  durirg  the  'LOCK  table  update  phase' 


of 


darinj;  this  pha'^ 


•..■ill 


h-.v-.  no  effec:  upon  o:.he.-  nodes,  i-esultl.ag  o;ily 
in  the  removal  of  this  node  from  the  up  list  of 
the  logical  component  which  is  recovering 


Examination  of  all  these  cases  completes 

VV  ;  p,.  [  ■] 


The  above  result  allows  us  to 
assumption  made  in  assertion  1)  that  no 
crashes  occur  during  LCR  and  state  the 
assertion . 


relax  the 
additional 
following 


.SsefTIQ;;  5.:  At  the  end  of  the  LCR  mechanism,  all 
5f^^pi«  of  the  LOCK  table  for  a  logical 
:omponent  arc  identical  to  the  value  that  the  LOCK 
:able  of  the  crashed  LC  would  have  if  all  the 
Slobally  accepted  requests  were  allowed  to  complete 
right  before  the  crash  of  the  LC. 


Finally  we  prove  that  every  logical  component 
is  internally  consistent. 


THEOREM  3:  Every  logical  component  is  internally 
consistent 

Proof:  Let  C  be  any  logical  component.  We  have 
to  prove  that: 

i)  the  set  of  LOCK  tables  of  C  is  LT- 
consistent 

li)  there  is  one  and  only  one  LC  for  C. 

statement  i)  is  clearly  true  for  normal 
operation  of  component  C  since  assertions  1  and 
2  were  demonstrated  for  this  case.  Now,  bv 
assertion  5  all  the  copies  of  the  LOCK  table  are 
identical  at  the  end  of  LCR.  So,  in  this  case 
LT-consistency  is  trivially  satisfied. 

Statement  ii)  was  proved  to  be  correct  in 
assertion  3  for  the  case  in  which  no  additional 
crashes  occur  during  LCR.  But,  by  theorem  2, 
LCR  is  robust.  This  allows  us  to  consider  the 
effect  of  LCR  as  if  no  additional  crashes  occur 
during  its  execution,  and  concludes  the  proof 

[]. 


l-l  -  Single  Node  ,Pecovej:y. 

So  far  wc  have  described  how  the  system 
recovers  from  a  logical  component  crash.  We  show 
now  how  a  node  which  is  down  becomes  active  again, 
or  in  other  words,  how  it  gets  logically  connected 
to  a  logical  component.  Let  node  j  be  such  a  node. 
The  first  step  to  become  active  is  to  find  out  who 
is  the  LC.  This  step  is  carried  out  by  sending  the 
"WHO  IS  THE  LC  ?"  or  WLC  message  to  any  up  node. 
Then,  node  j  sends  a  message  called  "HI  THpE"  or 
HT  to  the  LC  telling  him  that  node  j  is  alive 
again.  If  the  LC  is  not  undergoing  any  kind  of 
^ash  recovery  it  will  send  its  LOCK  table  and  its 
up  list  to  node  J.  An  "ACCEPT  LOCK"  or  "ACCEPT 
RELEASE"  message  is  sent  to  node  J  by  the  LC  for 
every  lock  or  release  lock  request  for  which  not 
-.I  i.u»  T*  .....  Rft  ?np..45taces  have  been  received. 


3.11.  -  Robustness  .of  ME 

THEOREM  H:  The  Single  Node  Recovery  (SNR)  algorithm 
Ir.  robust. 


Proor;  Let  j  b-i  bbe  r.:eoverl,xv;  I'.r.Je  and  .V-t  LCi 
be  the  LC  to  which  node  j  is  trying  to  connect 
with.  The  proof  is  extremely  simple  since  the 
only  two  crashes  of  interest  are:  a)  LLCj  crash 
and  b)  LCI  crash.  Case  a)  is  clearly  a  terminal 
c-.se.  Case  b)  l.s  also  a  t-aml-.usl  c-,sb  sjac»  a 
cra-ciofLCi,  i.-afor?  it  is  able  t  -.-•sd  to? 
LOCK  tabic  to  LLCJ,  preve.nts  the  LOCr.  table 
from  being  received  by  node  J,  thereby  implyino 
in  SNR  having  to  be  restarted.  This  completes 
t.hp  nrnof.  (3 


3.3  -  LPRi-dill 


As  a  result  of  the  Logical  Component  Recovery 
algorithm  an  LC  will  be  elected  in  each  logical 
component  of  the  network.  Transactions  which  are 
local  to  a  component  will  continue  to  be  serviced 
as  if  no  disconnecting  crash  had  occurred.  On  the 
other  hand,  transactions  which  span  more  than  one 


oomponent  will  have  to  wait  until  the  «c»p^ntB 
involved  are  brought  together  again.  It  Is  tne 
responsibility  of  each  LC  to  detect  when  two 
components  are  physically  connected  again  and  to 
ta',<e  the  necessary  steps  to  merge  them  Into  one 
logical  component.  The  merge  of  logical  cMponents 
will  always  be  done  on  a  pairwise  basis.  *  **  °  ® 
Logical  Component  Merge  mechanism  Is  divided  into 
tv~o  phases,  namely  a  'reconnection  detection 
phase  and  a  'merge'  phase. 

In  the  'reconnection  detection'  phase,  each  LC 
sends  periodically  a  "WERE  YOU  ALIVE"  or  \m 
message  to  every  node  not  in  its  up  list.  Th 
purpose  of  this  message  is  to  detect  the  existence 
of  sites  which  were  not  reachable  before  but  which 
were  up.  For  the  purposes  of  the  description  that 
follows,  let  the  two  logical  components  to  be 
iellZ  \e  called  C1  -and  C2.  Let  LC1  and  LC2  be 
their  respective  LCs  and  U1  and  02  their  respective 
uolists.  LC1  will  take  an  active  role  during  the 
whole  recovery  phase,  while  LC2  will  take  “  P^sive 
one.  As  we  will  see,  a  crash  of  LC1  while  the 
recovery  mechanism  is  in  progress  will  result  in 
abort,  while  a  crash  of  LC2  after  the  'reconnection 
detection'  phase  is  tolerated.  Assume  that 

site  ti  in  C2  received  a  WYA  message  from  LCU  * 
component  is  said  to  be  in  NORMAL  status  if  it  is 
"r  undergoing  any  kind  of 
mechanism.  If  component  C2  is  in  Us  NORMA 
status,  site  sends  a  "YES  I  or  YIW  message 

to  LC1.  This  message  carries  within  it  the 
identification  of  LC2. 

At  this  point  LC1  has  to  establish  a 
connection  with  LC2.  This  connection  is  called  a 
primary-secondary  or  P-S  connection  type  with  LC1 
Ling  the  primary  and  LC2  the  secondary.  Since  we 
require  that  LCH  be  done  in  a  pairwise  basis,  the 
following  conditions  must  be  enforced  by  the 
protocol  that  establishes  a  P-S  connection: 

Cl:  an  LC  cannot  be  primary  (secondary)  for  more 
than  one  P-S  connection. 

C2:  an  LC  cannot  be  primary  and  secondary 
simultaneously. 

The  P-S  connection  is  attempted  by  having  LC1 
a  "l-ST  US  MERGE"  or  LUH  message  to  LC2.  The 
l-'.'tua  pf  LCl  is  now  charged  to  ATTEMPT. 
status  of  LC2  la  NORMA-,  which  means  .na-  ne--he. 
Logical  Component  Merge  nor  Logical  Comp^ent 
Recovery  is  being  attempted,  LC2  sends  a  MERGE 
ACCEPTED"  or  MA  message  to  LC1  and  changes  its 
ir.t°-r.=l  state  to  SECONDARY.  Upon  receipt  of  the 
V..''  the  cunn-ction  l.s  consider-d  to  b- 

-n— V,3fuLlv  es‘ublished  by  l.CI.  li  th-  status  of 
LCrls  not  NORMAL  then  a  "MERGE  A'^^WT 
or  MAR  message  is  sent  to  LC1  which  wll 
retry  later  or  will  try  a  connection  with  another 

LC. 

The  above  interconnection  strategy  could 
clearly  allow  undesirable  race  conditions  to  occurj^ 
such  as  having  two  LCs  trying  to  Pl^y  the  ^ole  of 
primary,  leading  the  system  into  deadlock 
situations.  To  avoid  this  problw,  ye  « 

site  dependent  priority  to  each  LC  (no  two  sites 
have  the  same  priority) .  LUM  messages  from  low^ 

higher  priority  LCs,  if  received  while 


connection  has  not  yet  been  completed  i.e.  the  HA 
message  has  not  been  received,  cause  the  connect  on 
being  attempted  to  be  broken.  To  this  end  the 
priory  sends  a  "CLOSE  CONNECTION"  or  CC  message  to 
its  intended  secondary. 

That  the  protocol  outlined  above  satisfies 
conditions  Cl  and  C2  is  proved  in  section  H .  1 . 
Figure  1  shows  a  state  transition  diagr^ 
describing  the  interconnection  protocol.  This 
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protocol  is  the  same  for  every  node. 

are  STATUSes,  while  arc  labels  f ® 

where  R  is  the  message  whose  arrival  triggers  t 

transition  and  T  is  a  sequence  of  actions 

(transmission  of  messages)  which  occur  as 

consequence  of  the  transition. 

After  a  P-S  connection  has  been  established 
between  LC1  and  LC2,  they  will  not  accept  any  more 
new  lock  or  lock  release  requests  from  nodes  in 
their  CTOmponants  and  will 

o:i23.  An  Diitarntidirg  ■.-.'qu-n-  1-  .  ", 

AL  or  AR  messages  have  been  already  ^  " 

all  the  corresponding  LA  or  RA  messages  a  . 
received.  After  all  outstandirg  acquests  have  been 
completed  by  LC2  it  sends  to  LCl  a 
or  RTH  U“3sng9  contnlnir-  as  argumenVs  t..p  upli 
f  and  the  LCTE  table  at  LOG  vbich  row^TS  the 
for  all  nodes  in  C2.  The  receipu  o.  the  lu.. 
message  by  LCl  marks  the  end  of  the  'reconnection 
detection'  phase. 

The  'merge'  phase  will  construct  the  union  of 
the  LOCK  tables  at  both  components.  Notice  that  up 
to  this  point  no  permanent  change  has  been  done  to 
any  LOCK  table,  nor  up  list  of  any  node.  LCl  sends 
a  "SUBSTITUTE  YOUR  TABLE"  or  message  for 

cycle  through  the  set  of  nodes  in  TEMP_U  =  U1  U  U2. 
The  SYT  me«age  is  the  agent  which  confirms  the 
merge  of  the  two  components  by 

new  LOCK  table  for  the  oomponent.  *1^°'  ^  J 

lists  are  updated  and  LCl  becomes  the  new  LC  oi 


the  new  logical  oooponent. 


-  Robustness  Sil  Uil 

TCEOREM  5:  The  Logical  Component  Merge  (LCM) 
Algorithm  is  robust. 

Proof;  The  completion  point  for  the  LCM 
algorithm  is  the  point  where  the  SYT  message  has 
already  been  received  and  accepted  by  one  LLC. 


Let  LT(i),  U(i)  and  LCCi)  be  respectively  the 
LOCK  table  at  site  i,  the  up  list  at  site  i  and 
the  LC  identification  as  known  by  site  i.  It  is 
worth  observing  that  changes  to  the  values  of 
LT(i),  U(i)  and  LC(i)  at  any  site  i  other  than 
the  LC-1  site  are  only  done  upon  receipt  of  the 
SYT  message. 


Let  us  examine  the  possible  cases  of 
crashes  before  the  completion  point: 


CASE  1:  crashes  during  the  'reconnection 
detection’  phase 


A  crash  of  either  LC1  or  LC2  in  this  phase 
will  cause  LCM  to  be  aborted  and  a  LCR  to  be 
started  at  the  component  who  had  an  LC-crash. 
Since  no  LOCK  table  nor  up  list  has  been  changed 
so  far,  this  is  a  terminal  crash.  Since  hC1  and 
LC2  are  the  only  processes  involved  in  this 
phase,  we  conclude  that  this  phase  is  robust. 

CASE  2:  crashes  during  the  'merge'  phase 

A  crash  of  LC1  during  this  phase  will 
interrupt  LCM  and  start  LCR  for  component  Cl. 
As  no  permanent  changes  have  been  done  already, 
this  is  a  terminal  crash.  A  crash  of  any  other 
node  (including  LC2)  clearly  does  not  affect  any 
other  node  nor  the  mutual  consistency  of  the 
merged  logical  component  []. 


H.  - 


of  the  Recovery  Algorithms 


We  show  here  that  there  is  no  interaction 
between  the  three  recovery  algorithms.  To  that 
effect  one  has  to  show  that: 


LCM  i.’’  d.-jp'-  pair'.::.'!? 

b)  LLR ,  LCM  .and  LNR  mutually  exilu.iive. 


To  verify  condition  a)  we  only  need  to  show 
that  conditions  Cl  and  C2  stated  in  section  3.5  are 
satisfied  by  the  P-S  connection  protocol.  ini.. 

ion  in  in  section  .  1 .  (:enditj,o.n  b) 
‘i;  ewn  to  hoi:!  ia  jec-ion  ^,2. 


is  an  o-orc  from  vertex  i  to  vertex  J  if  vertex  i 
is  attempting  a  P-S  connection  to  vertex  J.  Such 
an  a-arc  is  created  as  soon  as  vertex  i  enters  the 
ATTEMPT  state  (see  figure  1).  The  graph 
displays  the  pattern  of  established  and  attempted 
connections.  Let  e-G  be  the  subgraph  obtained  from 
G  by  considering  only  e-arcs  of  G  and  a-G  be  the 
_ ...  Ku  fa\r\no  onlv  the  a-arcs. 


conditions  Cl  and  C2  can  now  be  rephrased  as 
follows; 

C1.1:  0  <=  indegree(v)  <=  1  and  0  <- 

outdegree(v)  <=  1  for  all  v  in  e-G. 

C2.1:  indegree(v)  •  outdegree(v)  =  0  for  all  v 
in  e-G. 


Every  a-arc  will  either  be  deleted  from  G  when 
the  attempted  connection  is  broken  or  will  become 
an  e-arc  if  the  connection  is  successfully 
established.  So,  we  want  to  prove  the  following: 

THEOREM  6:  Given  a  graph  G  whose  e-graph  satisfies 
conditions  C1.1  and  C2.1,  the  new  e-graph  obtained 
from  G  as  new  connections  are  established  also 
satisfies  those  conditions. 

Proof:  It  can  easily  be  seen,  from  the  protocol 
specification,  that  condition  C1.1  is  satisfied 
not  only  by  the  initial  e-graph  but  also  by  the 
graph  G,  since: 

a)  if  there  is  already  a  connection  between 
vertices  i  and  J  or  one  is  being  attempted, 
no  new  connection  is  attempted  by  neither 
vertex  i  nor  vertex  j. 

(j)  if  a  connection  has  already  been 
established  or  is  being  attempted,  the 
..  ..i-in  all  fnr^.hpr  attetr.Dts. 


so,  it  remains  for  us  to  examine  all  the 
possible  cases  in  which  condition  C2.1  could 
conceivably  be  violated  in  C  and  show  that  the 
resulting  e-graph  obtained  when  one  or  nore  a- 
arcs  become  e-arcs  still  satisfies  th 
condition.  There  are  four  possible  cases,  two 
of  which  can  never  happen  due  to  the  protocol 
fi,;  ,-i or.,  whll"  th?  tw?  '-c 

be  exa-minsd.  Givo.n  any  three  vertiees  a,  b  an-. 


a) 

b) 

c) 

d) 


(a,b)  and  (b,c)  are  e-arcs. 

(a  b)  is  an  e-arc  and  (b,c)  is  an  a-arc. 
(n’h')  is  lu:  a-arc  :  nd  (b.c)  is  r.;'  c-.-'-c. 
la.bj  and  (b,c)  are  .a-.arcs. 


i-l 


i  si  LCMs 


Consider  a  directed  graph  G  whose  vertex-set 
is  the  set  of  LCs  and  which  has  two  distinct  types 
of  arcs,  namely  e-arcs  and  a-arcs.  There  is  an  e- 
arc  from  vertex  i  to  vertex  J  if  theie  is  an 
established  P-S  connection  between  vertices  i  an 
J,  vertex  i  being  the  primary.  Equivalently,  an 
e-arc  from  vertex  i  to  vertex  J  is  said  to  be 
created  in  C  whenever  vertex  1  enters  the 
COHNECTION  ESTABLISHED  state  (see  figure  l).  There 


Cases  a)  and  b)  are  the  impossible  ones. 
In  case  c)  the  attempted  connection  between  a 
and  b  will  fail  since  there  is  an  establishe 
connection  from  b  to  c  (sec  the  self  loop  at  the 
CONNECTION  ESTABLISHED  state  of  the  diagram  of 
figure  1).  Therefore,  arc 

In  case  d)  nodes  a  and  b  are  in  the  ATTEMPT 
^tate  If  (a,b)  becomes  an  e-arc  we  can  see 
'that  the  transition  labeled  LUM/CC;MA  from  state 
ATTEMPT  to  the  state  SECONDARY  is  taken  at 
vertex  b,  causing  the  attempted  connection  (b,c) 


to  be  broken.  Therefore  ore  <o,b)  boconee  on 
e-aro  while  are  (b,o)  disappears.  On  the  other 
hand,  if  (b,c)  becones  an  e-nro  in  the  first 
place  we  are  back  to  case  o)  which  was  already 
exaained.  [] 


Ve  take  the  opportunity  here  to  prove  that  the 
P-S  connection  protocol  is  such  that  all  the  a-arcs 
in  C  will,  in  a  finite  time,  (of  the  order  of 
maenitude  of  the  transnlssion  delay  time  in  the 
network)  either  disappear  or  become  e-arcs.  In 
other  words,  the  P-S  connection  protocol  is 
deadlock  free. 

THEOREM  7:  The  F-S  connection  protocol  is  deadlock 
free. 

Proof:  Ve  must  prove  that  there  can  be  no  long 
lasting  cycles  in  G.  The  interesting  case  is, 
of  course,  that  of  cycles  made  out  only  of  a- 
arcs,  since  as  shown  in  the  previous  theorem, 
any  a-arc  adjacent  to  an  e-aro  will  disappear  in 
a  finite  time. 

Consider  a  cycle  in  a-G  and  two  adjacent 
a-arcs  (a,b)  and  (b,o)  in  the  cycle.  Vertices  a 
and  b  are  in  the  ATTEMPT  state.  There  are  only 
two  possible  cases  to  consider: 

CASE  1:  [PRIORITfCa)  >  PRIORlTYCb)] ;  In  this 
case,  if  the  "MERGE  ACCEPTED"  message  from 
vertex  c  is  received  by  b  before  the  "LET  US 
MERGE"  message  from  a  then  (b,c)  becomes  an  e- 
arc  and  (a,b)  disappears. 

CASE  2:  [PRIORm(a)  <  PRlORITYCb)] :  Here,  arc 
(a,b)  will  disappear  since  a  has  lower  priority 
than  b. 

In  any  event,  the  cycle  will  be  eventually 
broken.  Kote  also,  that  vortex  o  could  be  the 
same  as  a  and  the  above  analysis  is  still  valid. 
[] 


FICift  2  -  «l£  STAR  TAV’ClIia.  DIWA".  l.£ 

HaATlOGilPS  QivSeSVtr.: 

.  Uj  -  U1  Ull?  **EFS  tX2  \u  IN  IE. 

,  Ul  IS  (XmAiNQ  IN  Ul. 

,  ICl  IS  IN  ll. 

The  state  [NORMAL,  LC j ,  Uj]  is  state  which 
resulted  from  a  successful  merge  of  component  Cl 
with  another  component,  for  instance  C2.  The  state 
[NORMAL,  LCi,  Ui]  is  a  state  which  resulted  from  a 
successful  Logical  Component  Recovery. 

By  inspection  of  the  diagram,  we  observe  that 
a  node  can  only  go  from  one  normal  state  to  a 
different  normal  state  after  one  and  only  one 
recovery  mechanism  has  been  completed.  Therefore, 
there  is  no  interaction  among  the  three  recovery 
mechanisms . 


iL.2  -  p<3lointnes3  iiC  i,.QE,  LCM  nM  5M 

'.V  'I.'-''  dO*'l.n‘'  3 

as  a  direcct'd  c;raph  whose  vertices  are 

"states  of  a  network  node  and  whose  arcs  represent 
transitions  between  states.  The  state  of  a  node  i 
is  the  3-tuple  [STATUS(l) ,  LC(i),  U(l)],  where 
LC(i),  UCit  are  ns  defined  before.  ST.ATUS(i)  is 
of  t's—  to  _  sl.e  -  1^ 

attn-’i:od  ns  viewed  by  site  i.  b'J.tMAL  statua 
indicates  that  neither  LCR  nor  LCM  is  in  progress; 
RECOVERY  means  that  LCR  is  taking  place  and 

QUIESCENT  indicates  that  LC(1)  is  rejecting  further 
requests.  The  labels  on  the  arcs  specify  the 
conditions  upon  which  a  transition  between  two 
states  occurs.  These  conditions  can  either  be  a 
crash  detection  or  a  message  arrival.  The  diagram, 
shown  in  figure  2,  shows  all  possible  state 

transltlona  for  a  rode,  other  than  LCI,  which  is  in 
a  component  Cl,  with  LC  equal  to  LCI  and  up  list 
equal  to  Ul.  From  every  state  there  is  a 

transition  to  the  DOWN  state.  These  transitions  are 
rot  represented  in  the  diagram  for  obvious  reasons. 


5..  -  l.Qgical  Component  Mutual  Consistency. 

Lpb  us  show  thst  the  CLC  protocol 
( itu.’lud  irg  th^  rt'ccvery  .3--ch3ni  <mJ)  is  .".uu'i  t.Tir. 
ti'.i  33t  of  Logical  Components  into  which  t.ns 
network  is  partitioned  is  mutually  consistent. 

THEOREM  8;  The  set  of  logical  components  into  which 
the  :,atwork  part;  tvon-i  is  mu'-  Uai  ly 

Proof:  By  theorem  3  each  one  of  the  logical 
components  is  internally  consistent.  It  remains 
for  us  to  prove  that  there  can  be  no  look 
present  at  any  LOCK  table  of  any  component  which 
conflicts  with  another  such  lock  of  any  other 
component.  This  theorem  is  trivially  true  when 
there  is  only  one  logical  component.  Further 
net  partitioning  does  not  destroy  this  property 
since  looks  are  only  granted  if  they  are  local 
to  «  component,  which  implies  that  they  do  not 
conflict  with  any  other  look  granted  at  any 
other  component.  [] 


-  rintabase  ponslatencv 

We  show  here  that  given  a  deadlock  free, 
consistency  preserving  locking  nechanisa  for  a 
central l7.ed  database  ,  the  CLC  protocol  can  be 

used  to  implement  an  equivalent  robust,  deadlock 
free,  consistency  preserving  locking  mechanism  for 
a  distributed  (EIlE)  •  *  database  is  said 

to  be  in  a  consistent  state  if  all  the  data  items 
satisfy  a  set  of  assertions  or  consistency 
constraints.  A  transaction  is  a  sequence  o 
accesses  which  take  the  database  from  a  consistent 
state  into  another  consistent  state.  Thus, 
transaction  is  the  unit  of  cons^tency.  Let  us 
define  an  access  as  the  pair  (P,a)  where  P 
loKicaT  description  of  the  portion  of  the  database 
to  be  accessed  and  a  is  an  access  mode  (e.g. 
read, write,delete, etc.).  If  all  the  ^ 

granted  by  a  process  which  has  ocraplete  knowledg 
of  every  other  active  locks  (as  is  the  case  with 
the  LC)  and  if  every  access  is  checked  against  the 
LC  copy  of  the  LOCK  table  (this  condition  will  be 
relaxed  later) ,  to  see  whether  the  transaction 
holds  the  necessary  locks  ,  then  the  lo 
scheduler'  for  a  CDB  described  by  Eswaran  [  ]  can 
be  implemented  in  a  straightforward  manner  with  the 
uso  of  the  CLC  protocol.  Such  a  locking  mechanism 
has  the  properties  of  being  robust  and  preserving 
the  consistency  of  the  DB.  Notice  ^hat  deadlock 
prevention  or  detection  mechanisms  can  be  carried 
out  by  the  LC  since  it  has  complete 
all  activities  in  its  component.  Recall  that  ii 
the  network  is  partitioned  into  "ore  than  one 
component,  locks  granted  in  one  of  them  do  not 
conflict  with  locks  active  in  others.  Therefor  , 
distinct  LCs  manage  disjoint  sets  of  "resources  . 
where  a  resource  here  means  an  individually 
lockable  data  item  in  the  DB.  So,  a  deadlock 
prevention  or  detection  policy  can  be  implemented 
in  each  LC  independently  of  all  the  others. 

The  requirement  that  every  access  be  checked 
against  the  LOCK  table  at  the  LC-site  can  be 
relaxed  in  favor  of  having  the  access  checking  done 
locally.  In  order  for  this  to  be  possible  a  lock 
oust  be  considered  to  be  active  at  a  given  site  i 
for  a  time  interval  T2  contained  in  the  time 
interval  T1  during  which  the  lock  is  active  at  the 
LC-site,  otherwise  some  portions  of  the  D® 

I'.ckfd  in  conf?.icting  nod=s  for  di^ere.. 
t  ••'-■i.-’t ion'5 .  3  a  double  .'Xi-' 

y;.agran"displayi.cg  time  at  the  LC-site  and  at  a 
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given  site  i  where  a  look  request  is  originated. 
T1  starts  when  the  "CONFIRM  LOCK"  message  is  vF^t 


to  «very  site  In  the  component  snd  ends  with  the 
broadcast  of  the  -CONFIRM  RELEASE"  mess^e  T2 
.starts  with  the  arrival  of  a  CL  message  at  site  i. 
Although  a  look  is  only  removed  from  a  LOCK  table 
len  the  corresponding  "CONFIRM  RELEASE”  message 
arrives,  it  can  be  flagged  as  'waiting  for  Femoval 
as  soon  as  a  "RELEASE  LOCK”  message  is  sent  from 
the  LC  to  Bite  i.  For  access  checking  purposes, 
all  flagged  looks  must  be  considered  as  non  active. 

extra  precaution  that  must  be  taken  in  this 
case  is  to  unflag  all  flagged  locks  after  LCR  has 
taken  place. 

2.  PERFORMANCE.  EESULT5 

Some  of  the  results  of  the  cost  and  delay 
analysis  for  the  CLC  protocol  [2]  are  presented 
here  The  update  model  used  in  this  analyses  is 
such  that  some  of  the  previously  defined  messages 
are  grouped  into  a  single  physical  message.  These 
results  indicate  that  the  "’'"'’age  update  delay 
Dupdt,  does  not  depend  directly  on  the  size  of  the 
neLork  for  many  network  topologies  of  interest  and 
its  expression  is  given  by 

Dupdt  =  2»T  4-  3»TMAX  +  V 

where  T  is  the  average  message  delay  introduced  by 
the  network  between  two  distinct  sites,  TMAX  is  the 
average  maximum  delay  between  a  sender  and  s^'^al 
destinations  and  W  is  the  average  waiting  time  for 
a  look  request  to  be  granted  at  the  LC. 

Lower  and  upper  bounds  for  the  average 

recovery  delay,  R,  are  given  by 

R  >=  (n+1)»T  +  3»TMAX 
and 

R  <  [a»(n-2)  +  n  +  1]»T  +  3®TMAX 

Where  n  is  the  number  of  sites  in  the  network  and  a 
is  the  ratio  Tout/T  where  Tout  is  the  time  after 
which  a  nominator  assumes  that  the 
and  sends  another  "ACCEPT  NOMINATION"  message  to 
the  next  site  in  the  nomination  order. 

The  average  communications  cost,  Cupdt, 

incurred  by  sn  update  is 

(rr;  -  Ii-*- 

where  M  is  the  average  communications  cost  per 
m^ssagl  Lower  and  upper  bounds  for  the  recovery 
cost  Crec  are  given  by 

Crec  >=  ("I’n  -  2)''M 
ar.'l 

Crec  <  (6®n  -  '4)’-'M 


-  Extension 

It  has  been  observed  in  most  of  the  existir^ 
distributed  systems  that  a  large 

generated  transactions  is  local ,  in  the  sense  that 
the  resources  needed  to  satisfy  a  given  transaction 
Te  eUher  located  at  the  site  of  orig  n  of  the 
transaction  or  in  neighboring  sites.  This 

observation  suggests  that  Je 
tenns  of  comFi^^ications  cost  and  delay  «n  be 
achieved  if  one  optimizes  the  operation  of 


algoritho  to  adapt  to  ouch  a  highly  okewed 
distribution  of  activity.  To  illustrate  the  point, 
consider  a  set  of  interconnected  computer  networks. 
We  believe  that  in  such  a  case,  most  of  the 
operations  will  be  confined  to  one  computer  network 
while  relatively  few  operations  will  cross  network 
boundaries . 

Tnis  section  outlines  an  extension  to  the  Cl,C 
protocol  that  permits  the  forms  of  performance 
optimisation  needed  for  the  cases  discussed  above. 
The  extension,  which  we  call  an  HCLC  (for 
Hierarchical  CLC)  protocol,  consists  of  a 
hierarchical  organization  of  resource  controllers. 
A  tree  of  controllers  is  provided  where  the  root  is 
considered  to  be  at  level  0  and  all  the  children  of 
a  controller  at  level  i  are  at  level  i+1  in  the 
hierarchy. 

Each  controller  (except  for  the  leaves)  serves 
as  an  LC  for  its  children.  Also,  each  controller 
(except  for  the  root  of  the  hierarchy)  acts  as  an 
LLC  for  its  parent.  Therefore,  each  controller  has 
to  maintain  two  distinct  LOCK  tables,  which  we  call 
parent-LT  and  child-LT.  The  parent-LT  for  the  root 
controller  contains  one  lock  for  the  whole  DB  in 
exclusive  mode.  The  child-LT  for  a  leaf  is  empty. 

An  intuitive  description  of  the  nonr.al 
operation  of  the  HCLC  protocol  can  be  easily 
understood  in  the  light  of  an  example.  Figure  D 
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shows  a  three-level  hierarchy.  Application 
programs  interact  with  lock  controllers  K1  and  K2 
at  one  level  above  the  leaves  (since  the  leaves  are 
LLCs) .  This  interaction  is  the  same  as  the  AP-LC 
interaction  in  the  CLC  protocol.  • 
application  programs  are  not  aware  of  the  fact  that 
the  controllers  are  hierarchically  organized.  Let 
a  lock  request,  x,  from  API  be 

X  conflicts  with  any  other  lock  in  child-LTlKil 
then  the  lock  request  is  treated  in  the  same  way  as 
in  the  CLC  protocol.  If  there  is  no  conflict. 


K1 's  psrent-LT  is  searched  for  a  lock  j  -which 
covers  x.  A  lock  x1  is  said  to  jgver  a  lock  x2  if 
the  portion  of  the  DE  specified  by  x2  is  contained 
in  the  portion  of  the  DB  addressed  by  x1  and  if  the 
lock  mode  specified  by  x1  is  not  weaker  than  the 
lock  mode  in  x2.  The  existence  of  a  lock  such  as  y 
in  parent-LT(Kl)  indicates  that  K1  currently  has 
control  over  the  resources  requested  by  API.  If  y 
is  found,  the  lock  request  x  can  be  granted  and  to 
this  end  K.1  interacts  with  K3  and  K<l  in  the  same 
way  as  an  LC  interacts  with  the  LLCs  in  its 
component.  On  the  other  hand,  if  y  canno,.  be 
found,  the  lock  request  x  is  submitted  by  K1  to  KO. 
KO  will  act  with  respect  to  K1  and  K2  in  the  same 
way  that  K1  did  with  respect  to  K3  and  K^l.  The 
difference  in  this  case  is  that  since  KO  is  the 
root  there  is  a  lock  in  parent-LT{KO)  for  the  whole 
DE  in  exclusive  mode.  This  lock  covers  any  other 
lock. 

In  an  HCLC  protocol,  locks  may  be  released 
•either  exnl  icitlv  or  automatically..  Locks  in 
child-LT(Ki)  ,  fori=:1,2,  are  released  explicitly 
upon  request  from  APs  using  the  same  mechanism 
described  in  the  CLC  protocol.  Locks  in  parent- 
LT(Ki),  for  i=l,2,  can  be  released  automatically  as 
soon  as  there  are  no  locks  in  the  corresponding 
child-LTs  which  depend  upon  them.  To  this  end, 
each  lock  y,  in  parent-LT(K) ,  for  any  controller  K, 
has  associated  with  it  a  list  of  locks  in  child- 
LT(K)  covered  by  y.  Also,  each  lock  x  in  a  child- 
LT(K)  -points  to  the  lock  y  in  parent-LT(K)  which 
covers  x.  When  a  lock  x  is  explicitly  released 
f.om  child-LT(KI)  the  lock  list  for  its 

corresponding  lock,  y,  in  parent-LT(K1 )  is 

appropriately  updated.  Whenever  this  list  becomes 
empty,  a  release  request  may  be  automatically 
generated  by  K1  and  submitted  to  KO.  In  general, 
the  automatic  release  of  looks  can  be  propagated  up 
to  the  root. 

This  hierarchical  protocol  can  be  easily 
adjusted  by  policy  decisions  both  to  delay  such 
releases,  and  to  establish  early  looks  at  higher 
levels  in  anticipation  of  local  lock  requests. 
Look  management  analogous  to  LRU-like  memory 
management  policies  are  obvious  policy  candidates. 

For  the  set  of  interconnected  computer 
networks,  a  three- level  hierarchy  could  be 
c-.n  L.ructed  follow.i.  Tne.-?  is  LC  p-r 

C';-.:'put--r  network,  nil  of  them  at  level  1.  Their 
children,  at  level  2,  are  their  corresponding  LLCs. 
Finally,  the  root  is  any  site  acting  as  a  global 
controller  for  the  entire  collection  of  computer 
networks. 


An  interesting  property  of  the  proposed 
extension  is  that  there  is  always  one  controller 
which  is  able  to  detect  the  existence  of  a  cycle  in 
the  lock-request  graph.  This  controller  is  the 
common  ancestor,  with  the  largest  level  number,  to 
all  the  controllers  where  requests  in  the  cycle 
where  originated.  In  the  example  of  figure  4,  the 
common  ancestor  to  K1  and  K2  is  KO. 


Crash  recovery  algorithms  for  the  HCLC 
protocol  must  include  mechanisms  to  reconstruct  the 
hierarchy,  in  addition  to  the  recovery  mechanisms 
present  in  the  CLC  protocol. 


3..  -  rnncluslon 

This  paper  outlines  what  we  believe  to  be  a 
fairly  general  solution  to  synchronization  issues 
!n  distribuLd  systens  in  the  face  of  asynchronous 
unplanned  failures.  The  algorithns 
for  normal  operation  and  recovery  are  robust  with 
J^spect  to  the  criteria  sot  up  at  the  beg  nning 
this  report.  We  are  unaware  of  any  other 
SnchronLation  protocols  which  simultaneously 
satisfy  each  of  those  requirements. 

The  work  is  primarily  suitable  for 
environments  in  which  the  cost,  includi^ 

Bending  messages  is  not  high  relative  to  the 
operaUons  which  are  to  be  performed  once  locking 
il  co-p'ete.  Locally  distributed  systems  often 
provide  examples  of  such  an  enviro, ament 

Locraohically  distributed  networks  also  fall  into 
SrifOSSory  if  the  amount  of  work  to  be  performed 
after  looking  is  significant  relative  to 
coaaunications  cost. 

The  protocols  are  also  best  suited  for  usage 
behavior  that  cannot  be  directly 
advance  It  is  assumed  that  query  and  up 

activity  will  be  largely  ad  hoc  in  nature  -  the 

E.n.r.l  ..3.  Which  he.  hc.n  r.c.lvinE 

increasing  attention  in  recent  years. 

The  presentation  of  any  substantial  Protocol 
would  not  be  complete  without  an  outline  of  a  proof 
that  the  protocol  is  correct  with  respeov  to  its 
dLLed  properties.  A  significant  portion  of  this 

?.  th.r.ror.  acvotd  t»  th.  Pwrpo... 

Further  analysis  using  automated  tools  is  also 
underway. 

in  conclusion,  these  protocols  should  help 
rieaorstrate  the  practicality  of  integrated 
cooperation  of  activities  in  distributed  systems. 
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0..  Abstraet 

UCLA  Unix  is  a  wholly  new  operating  system  whose  architecture  and  implementation 
are  oriented  toward  highly  reliable  security  and  integrity  enforcement  while  support¬ 
ing  a  wide  degree  of  system  functionality.  The  system,  now  operational,  demonstrates 
that  it  is  possible  to  provide  a  convenient,  efficient  secure  operating  system  on 
conventional,  third  generation  hardware  architectures.  This  paper  reports  on  the 
development  of  UCLA  Unix.  Much  of  the  discussion  is  concerned  with  the  software  ar¬ 
chitecture  which  evolved,  since  a  number  of  innovations  are  included  with  surprising¬ 
ly  little  mechanism.  The  methods  employed  to  build  and  verify  the  system  are  also 
described,  and  the  impact  of  the  requirement  to  support  fully  the  standard  Unix 
operating  system  functionality  is  discussed. 


1 .  Introduction 

Tncrc  has  been  considerable  interest  for  some  time  in  developing  an  operating 
system  which  could  be  conclusively  shown  secure,  in  the  sense  that  the  information 
stored  on  behalf  of  a  heterogeneous  user  population  'was  safely  protected  from  unau¬ 
thorized  access  or  modification,  even  in  the  face  of  skilled  attempts  to  do  so.  Ear- 


*'ThirrcIearch  was  supported  by  the  Advanced  Research  Projects  Agency  of  the  Depart 
ment  of  Defense  under  Contract  MDA  903-77-0211. 
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ly  attempts  to  attain  this  goal  consisted  largely  of  auditing  an  existing  system  by 
attempts  at  circumventing  the  controls,  and  then  revising  the  implementat j.on  code  to 
block  any  successful  paths  that  were  found.  Unfortunately,  this  approach  failed  in 
producing  a  secure  system,  largely  because  third  generation  operating  systems  contain 
so  niany  errors  that  "penetration  audits"  followed  by  patches  inevitably  led  to  a  sys¬ 
tem  whose  controls  were  still  easily  penetrated. 

From  a  viev/point  of  principle  however,  there  was  an  even  more  fundamental  limi¬ 
tation  to  the  early  approaches,  frequently  mentioned;  testing  proves  the  presence  but 
not  the  absence  of  bugs.  Therefore,  a  more  strictly  constructive  methed  was  re¬ 
quired,  by  v;hich  it  vrould  be  possible  conclusively  to  demonstrate  the  correctness  of 
the  security  controls.  It  was  hoped  that  this  goal  woui.d  result  in  a  much  superior 
system  in  other  respects  as  well.  The  experience  to  be  reported  here  strongly  bears 
out  that  expectation. 

UCLft  Unix  is  a  kernel  based  system  architecture  developed  in  a  manner  by  which 
program  verification  techniques  could  be  (and  have  been)  applied.  The  system  inter¬ 
face  is  essentially  identical  to  Unix  as  released  by  Bell  Laboratories  [Ritchie  74], 
and  the  software  presently  runs  on  DEC  PDP-11/45s  and  PDP-n/70s.  The  kernel  strue- 

t'l-';,;;;  V  e 1  f'i  o  a ’  0  n  proeei!U''\‘0  ,  ';.i  th  the 

powerful  means  by  v;hich  the  system's  security  and  integrity  can  be  demonstrated  and 
assessed.  Support  of  the  Unix  interface  illustrates  the  robustness  and  functionality 
of  c.o-c  rcsulbii'.g  system. 

However,  the  kernel  and  verification  goals  imposed  significant  constraints  on 
the  size,  complexity  and  general  architecture  of  the  system.  The  result  therefore  is 
quite  different  from  vrhat  would  have  been  expected  otherwise.  Nevertheless,  in  re¬ 
trospect,  we  arc  unaware  of  any  decision  forced  by  these  goals  which  has  not  also  had 
the  effect  of  simplifying  the  system's  structure  and  improving  overall  reliability 
and  integrity.  There  has  been  no  significant  performance  penalty  either.  The  pri- 
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St  in  obtainins  a  secure  operating  system  appears  to  be  found  in  the  care  re¬ 


quired  during  design  and  development. 

One  important  fallout  of  the  system  de.sign  is  considerably  enhanced  system  in¬ 
tegrity.  Improvement  results  from  the  significant  reduotion  in  common  mechanism 
operating  on  behalf  of  all  users,  a  characteristic  that  was  necessary  to  make  verifi¬ 
cation  and  certification  of  the  system  practical. 

In  the  next  sections  we  outline  the  UCLA  Unix  architecture,  together  with  expla¬ 
nations  for  the  design  choices.  Verification  and  the  programming  language  are  also 
discussed,  and  illustrative  examples  of  the  effects  of  Unix  functionality  on  the 
system's  operation  are  given. 


£.  Overall  Architecture  of  liCLA  Unix 

The  UCLA  Unix  architecture  contains  a  number  of  major  modules,  v;hose  relation  to 
one  another  is  suggested  by  figure  1.  The  kernel  should  be  thought  of  as  an  operat¬ 
ing  system  nucleus  which  provides  about  a  dozen  primitive  operations  callable  from 
user  processes.  That  is,  the  kernel  implements  a  number  of  abstract  types  and  the 
valid  operations  on  each  tyne.  It  is  the  only  module  in  the  system  empowered  to  exe¬ 
cute  :i2r'lware  privileged  las  trust  ions. 

One  of  the  abstract  types  implemented  by  the  kernel  is  .process.  A  p-^op-ss  con¬ 
tains  two  address  spaces  (supervisor  and  user'  mode  on  the  large  fud-lls).  An  operat¬ 
ing  system  interface  package  resides  in  one  address  space.  In  the  other,  application 
code  is  run.  VIhen  an  application  program  makes  an  operating  system  call,  control 
passes  to  the  o.s.  package  which  interprets  the  call.  If  necessary,  the  package  is¬ 
sues  kernel  calls  or  uses  kernel  facilities  to  send  messages  to  other  processes  to 
accomplish  the  needed  action.  All  such  calls  or  messages  are  controlled  by  the  ker¬ 
nel.  Each  process  is  a  separate  protection  domain.  The  access  rights  of  the  domain 
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are  represented  by  eapabilities :  a  C-list  for  each  process  is  maintained  by  the  ker 


There  are  several  proeesses  that  are  speeial,  in  that  they  perform  system  relat¬ 
ed  functions.  Overall  system  seeurity  depends  on  the  eorreet  operation  of  two  of 
them.K  One,  nailed  the  poliey  manager,  is  the  only  process  capable  of  altering  pro¬ 
tection  data,  and  is  thus  the  site  where  various  seeurity  policies  may  be  implement¬ 
ed.  Type  extensions  to  kernel  objeets,  ineluding  file  systems,  typieally  would  also 
be  supported  here.  In  the  UCLA  system,  security  policy  plus  suitable  primitives  for 
the  Unix  file  system  to  support  proteetion  of  individual  files  are  built  in  the  poli¬ 
cy  manager  process.  The  second,  "initiator",  process  initially  owns  all  terminals 
(i.e.  has  capabilities  for  all  of  them)  and  is  responsible  for  user  authentieation . 
It  tells  the  poliey  manager  what  user  is  to  be  assoeiated  with  a  given  proeess. 

There  is  one  further  process  which  differs  from  the  typical  processes  employed 
for  applications  programming.  However,  this  one,  a  scheduler,  is  not  relevant 
data  security.  It  contains  short  term  resource  management  policy  for  epu  and  mam 
memory:  process  scheduling,  page  replacement  strategics  and  the  like.  UCLA  Unix  is  a 
demand  paged  system;  when  a  process  page  faults,  the  scheduler  is  informed  by  the 

'.V'r'n-l  so  '‘.h-vc  “‘ooropr ‘■'.u-'p  call  may  h?  u.-a.i-,’''!  ..C'M..  .I  t.  t . 

scheduler.  All  of  its  security  relevant  actions  are  accomplished  through  kernel  in¬ 
structions,  however. 

Thus  in  normal  operation  a  user  first  logs  into  the  initiator.  That  proceso 
then  sends  a  message  to  the  policy  manager,  who  initializes  a  process  for  the  user 
and  moves  the  user  terminal  to  the  new  process  by  issuing  appropriate  capabilities. 
Process  initialization  as  well  as  normal  computation  take  place  within  the  domain  of 


»"onr7ight  say  they  are  within  the  "security  perimeter."  Their  size  is  not  large 
compared  to  the  kernel  described  here. 
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the  given  process.  Additional  resource  requirements  or  file  activity  is  aecomplished 
through  messages  to  the  policy  manager.  Process  switching  oeeurs  whenever  a  given 
process  invokes  the  scheduler  process,  or  when  an  appropriate  clock  interrupt  forces 
such  an  invoke.  The  scheduler  ean  then  run  v.'hatever  process  it  wishes.  Page  faults 
also  force  an  invoke  of  the  scheduler,  so  that  it  can  initiate  appropriate  page  swap¬ 
ping. 


The  UCLA  Kernel  and  Abstract  Types 


The  kernel  can  alternately  be  viewed  as  a  basic,  stripped  down  operating  system 
or  as  an  implementor  of  a  number  of  abstract  types,  together  v;ith  the  operations  on 
those  types.  One  of  its  more  notable  features  is  the  fact  that  a  significant  number 
of  facilities,  normally  found  in  large  systems,  are  included  in  it  despite  its  very 
small  size  and  straightforward  structure.  The  basic  kernel  consists  of  approximately 
760  lines  of  Pascal  code,  not  including  I/O  support.  The  PDP-11  does  not  have  any 
channels,  so  that  the  functions  of  channel  programs  must  be  written  as  epu  code.  1/0 
support  in  the  UCLA  kernel  is  composed  of  two  portions:  a  device  independent  internal 
interface  of  approximately  300  lines,  and  as  many  device  dependent  drivers  as  are  re¬ 
quired  by  devices  present  on  a  given  machine  eonfigurntion .  These  arc  quite  S-uall  , 
a-:d  for  UCLm  in  hccL  ,  ni.:ppor  cing  m^ny  paripnarals ,  approx  imateiy  oUO  Ixnas 
of  code  are  required  altogether.  These  numbers  are  relevant  because  the  entire  ker- 


be  r'ibpacf  to  vT.r  LflartAon  p''oc  edur  cuto'jrf  v 
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capabilities,  this  quantity  of  code  is  not  unreasonable  (assuming  a  clean  structure). 


The  UCLA  kernel  implements  a  fixed  number  of  types,  the  four  listed  below.  Type 
extensibility  as  illustrated  by  CAL-TS3  or  Hydra  is  not  provided,  although  simple  ex¬ 
tensions  are  nov)  under  vay  to  provide  a  limited  form  of  this  facility.  The  imple¬ 
mented  types,  together  v;ith  the  permitted  operations,  are  discussed  below. 
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3..J.  Processes 

The  process  object  is  defined  to  consist  only  of  the  usual  state  variables  plus 
one  small  page.  It  does  not  include  the  process  virtual  memory.  As  a  result,  kernel 
calls  such  as  Invoke  can  be  quite  simple,  merely  moving  data  from  tables  to  epu  re¬ 
gisters  and  vice  versa.  All  process  relevant  kernel  calls  are  controlled  by  capabil¬ 
ities.  It  is  not  possible  to  issue  or  receive  a  Notify  for  example  unless  in  each 
case  a  capability  is  present  in  the  process'  C-list. 

The  process  abstraction  has  been  carefully  developed  to  permit  a  large  number  of 
processes  to  be  alive:  500  on  a  PDP-11  would  not  be  unreasonable.  To  do  so,  it  is 
necessary  that  very  little  locked  down  memory  be  required  per  process,  despite  the 
fact  that  there  are  asynchronous  events  taking  place  (such  as  I/O  completions  and  No¬ 
tifies)  v;hich  ean  occur  when  all  the  memory  of  a  process  is  swapped  out.  The  process 
must  be  notified  of  these  events.  However,  the  obvious  solution,  kernel  queues,  are 
undesirable  since  they  increase  verification  difficulties  and  lead  to  overflow  prob¬ 
lems  when  queue  space  is  exhausted.  The  UCLA  kernel  avoids  this  problem  by  a  number 
of  methods,  including  a  generalized  page  faulting  structure  and  efforts  to  keep  as 
much  per  process  information  as  possible  in  sv^appable  pages  allocated  to  the  given 

As  UT'-'ilt,  !  •s.i  th  >n  words  of  main  .nwit  be  r  ■■r -r'Vr'ri  lo-  :wi 

tivc  process. 

owerac  lor  w  -j/r i ’’’or  objoc'";  >1  v'-v-r-  orocr.'s  „.t'  • 

a .  Invoke 

b.  Initialize 

c.  Map-relocation- register 

d.  Notify 

e.  Sleep 

Invoke  moves  the  state  variables  of  a  process  into  the  epu  registers,  after  first 
saving  those  of  the  currently  running  process,  mostly  into  one  of  that  process's 
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pages.  Initialize  clears  the  state  variables  of  a  process  and  creates  those  few 
capabilities  needed  for  the  process  to  bootstrapp  itself.  The  Kap  call  is  the  means 
by  which  a  process  can  adjust  its  own  virtual  memory.  The  call  sets  the  mapping 
between  blocks  in  the  process  address  space  and  entries  in  his  C-list  (which  presum¬ 
ably  point  at  pages).  Notify  is  the  mechanism  by  which  one  process  can  interrupt  a 
set  cf  other  processes,  also  passing  a  very  small  amount  of  data.  Sleep  invokes  the 
scheduler . 


3..£  Pacres 


Pages  are  the  abstract  storage  unit  supported  by  the  kernel .  All  pages  have  a 
fixed  home  location  on  secondary  storage,  which  is  not  deallocated  when  the  page  is 
swapped  into  main  memory.  There  are  3  pss®  sizes  in  the  current  implementation,  with 
memory  frame  sizes  currently  set  at  sysgen  time  to  minimize  kernel  complexity.  In 
order  to  access  a  page,  a  process  must  first  obtain  a  capability  for  the  page.  Then 
the  Map  call  is  used  to  specify  v/here  in  the  process'  virtual  address  space  the  page 
specified  by  the  capability  is  to  appear.  At  that  point  the  process  can  attempt  to 
refer  to  the  page.  If  it  is  in  core,  the  hardware  register  will  be  loaded  and  the 
reference  will  succeed.  If  not,  the  process  will  page  fault  as  described  in  section 
'.7.  'Lcce  'jage  la  a  separat';  object,  controiieu  sharing  of  indi/idu ai  pages  is 

easily  dons. 


'ike  oaly  r.  i  on  s  on  pagos  ere: 

a.  Swap- in 

b.  Reflect 

Swap-in  copies  the  secondary  storage  version  of  a  page  into  main  memory,  changing  the 
name  of  the  object  associated  with  that  destination  page  frame  to  the  new  page.  The 
secondary  storage  copy  is  preserved.  Reflect  updates  the  secondary  storage  version 
to  match  main  memory.  Neither  of  these  operations  gives  the  caller  access  to  the 
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contents  of  the  page,  so  that  the  operation  can  be  issued  by  untrusted  code. 


3. -3.  Device 


I/Os  to  all  devices,  including  terminals,  arc  controlled  by  the  same  capability 
mechanism  as  all  other  operations.  However,  devices  such  as  terminals  are  treated  as 
two  devices:  an  input  part  and  an  output  part.  Two  capabilities  are  therefore  re¬ 
quired  to  read  and  write  a  terminal,  but  as  a  result  more  robust  security  policies 

can  be  supported . 

Completion  interrupts  arc  handled  just  like  any  other  process  notification.  All 
those  processes  with  capabilities  to  receive  interrupts  from  the  device,  and  with  in¬ 
terrupts  enabled,  will  receive  a  notification  when  the  device  generates  it. 

The  device  operations  arc  as  follov/s. 

a.  Start-i/o 

b.  Completion- interrupt 

Start-i/o  initiates  all  I/Os  except  swaps.  The  Completion-interrupt  is  the  hardware 
generated  call  which  typically  signals  completion  of  a  previously  started  I/O.  As  an 
entry  point  into  the  kernel,  it  is  little  different  from  any  other  call. 


3. it  Canabilities 

.he  eypability  is  tna  basic  kernel  representation  of  protection  inf ormation ; 

vhloh  objects  a  process  Is  entitled  to  access.  Each  process  has  associated  with  it  a 
C-llst  containing  those  capabilities,  stored  in  pages  that  can  be  sBapped,  but  vhioh 
are  directly  accessible  only  to  the  kernel." 


»  Tn«  policy  manager  is  given  read  access  to  capability  pages  so  that  it  need  not 
keep“  separLe  track  of  which  capabilities  for  pages  in  a  file  are  outstanding, 
the  discussion  of  the  policy  manager  for  futher  information. 
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Each  capability  consists  of  four  fields.  First  is  the  name  of  the  object  to 
which  this  capability  refers.  Second  are  the  access  rights  provided.  Next  is  a 
"guess"  value  v;hich  the  kernel  uses  to  attempt  to  quickly  find  the  entry  in  a  kernel 
table  which  maps  the  object  indicated  by  the  capability  to  a  physical  location.  In 
the  case  of  pages,  the  guess  is  the  index  into  the  kernel  page  table  to  the  slot 
where  that  page  entry  last  appeared.  It  in  fact  may  have  been  moved  by  subsequent 
Swaps  and  Reflects,  so  if  the  entry  does  not  match,  a  search  of  the  table  is  re¬ 
quired.  That  event  is  rare  however.  The  last  field  in  the  capability  is  of  no 
relevance  to  the  kernel,  but  can  be  set  via  the  Grant  call.  The  Policy  Manager  uses 
it  to  record  the  file  to  which  the  page  or  device  belongs. 


The  operations  on  capabilities  are  quite  limited:  they  can  be  Granted  .and  re¬ 
voked.  Revocation  is  accomplished  by  granting  the  null  capability  into  the  C-list 
slot  that  contains  the  capability  to  bs  revoked.  Thus  there  is  no  means  by  which 
processes  can  directly  pass  capabilities.  V/hile  this  fact  limits  what  can  be  done 
with  capabilities,  it  also  greatly  simplifies  many  issues  and  avoids  a  number  of  the 
criticisms  of  certain  capability  systems,  especially  the  danger  of  not  knowing  how 
access  to  an  object  has  propagated.  As  a  result,  the  kernel  can  more  accurately  be 
viev/ed  as  containin.g  no  security  policy.  All  such  decisions  regarding  rights 
tr  jn.-:!;’ ,  ia.a  Luding  ’..iltla.l  granting  of  rlghcs,  are  nane  only  oy  che  r.oftwi.’.’.  runni.ng 
in  the  process  which  has  the  ability  to  issue  Grants.  The  Policy  Manager  is  the  only 


.nuen 


'ocess  in  iJCL,.*'-  Lin  in. 


The  only  operation  on  capabilities  is 
a.  Grant/revoke 

It  adds  a  specified  capability  to  a  specified  slot  in  a  specified  process’  C-list. 
This  call  is  restricted  to  the  policy  manager,  who  implements  security  policy. 


The  C-list  composes  a  local  name  space  for  the  process.  This  name  space  has  tv/o 
effects.  First,  through  message  exchanges  with  the  policy  manager,  the  user  has  com- 
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plcte  control  over  which  C-list  slot  contains  a  given  capability,  thereby  permitting 
local  management  over  the  name  space.  Fabry  [Fabry  74]  points  out  the  significant 
advantages  of  this  facility.  Second,  kernel  names  are  not  visable  to  user  code.  In¬ 
stead,  the  capability  contains  that  name.  Therefore  user  code,  being  unaware  of  the 
actual  object  names,  cannot  use  them  for  a  confinement  channel. 


rstems 


Other  authors  [Schroeder  77]  have  noted  that  the  usual  views  of  abstract  types 
to  be  found  in  programming  languages  are  not  quite  suitable  for  operating  systems  be¬ 
cause  of  finite  resources  and  circular  dependencies.  In  Hultics,  for  example,  the 
process  manager  depends  on  the  page  abstraction,  since  the  manager  is  contained  in 
pages,  while  the  page  manager  is  a  process  and  hence  depends  on  the  process  manager. 
In  a  revised  design  for  Hultics,  abstract  types  are  used  in  a  sophisticated,  multiple 
layered  manner  to  solve  these  problems . [Schroeder  77]  Hov;ever,  as  noted  by  Gaines 
[Gaines  77],  the  method  rcquii’cd  need  not  involve  a  sophisticated  solution  at  all, 
and  is  largely  composed  of  static  allocations. 


This  is  the  approach  embodied  in  the  UCLA  kernel.  Processes,  pages,  and  devices 
ar-  ••ith-r  .Tr-^-at-u  d .  Tn^rc  are  as  oc-Te  J  a:;  terra  is  rr'AC-'  on 
secondary  storage  for  them.  The  number  of  processes  is  fixed  by  the  size  of  the  ker¬ 
nel  process  table.  Devices  are  added  at  sy.stem  generation  time.  This  static  view  is 
not  rsaiiy  a  limitation,  since  the  Policy  Manager  r'euoes  pr'ocess  "bodies"  and  pages 
by  reinitializing  them  via  kernel  calls.  Many  systems  include  these  size  limitations 
anyway,  although  perhaps  not  so  explicitly.  As  a  result,  the  kernel  type  structure 
is  exceedingly  simple,  and  yet  robust  enough  for  fairly  general  operating  system  ac¬ 
tivity,  as  illustrated  in  section  6  on  Unix  Functionality.  Further,  the  entire  ker¬ 
nel  is  small  enough  to  be  locked  down  in  main  memory,  in  space  removed  from  page 
m.anagerncnt ,  blocking  circular  dependencies. 
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3.*  6.  Kernel  Names 

The  names  for  kernel  supported  objeets  were  designed  to  maintain  several  impor¬ 
tant  properties  with  the  minimum  of  mechanism:  a)  unique  names  for  all  objects,  b) 
clear  knowledge  of  object  types  at  all  times,  and  c)  avoidance  as  much  as  possible  of 
complex  name  to  location  mappings,  vihich  must  be  maintained  by  kernel  code  if  object 
protection  is  to  be  at  all  meaningful.  Since  these  names  are  not  visable  to  normal 
user  processes,  v/ho  see  only  C-list  indexes,  considerable  design  freedom  Vfas  present. 
Therefore,  names  were  chosen  to  represent  the  home  location  of  the  object;  a  page 
name  consists  of  the  disk  device  and  block  number.  Hence  no  disk  map  need  be  main¬ 
tained  or  interrogated  by  the  kernel. 


3.-1  Paging; .  Segmentation  and  Scheduling 

UCLA  Unix,  unlike  standard  Unix,  is  a  demand  paging  system.  All  user  disk  I/O, 
including  swapping  of  the  process  virtual  memory  space  and  file  activity,  occurs  via 
the  paging  mechanism.- 


Page  faulting  is  invisable  to  all  processes  except  the  scheduler,  who  is  noti¬ 
fied  by  the  kernel  when  a  fault  occurs,  so  that  it  can  start  a  swap.  There  are  actu¬ 
ally  cv/o  "x'auics”  involved  in  acces-sing  pages.  The  most  significant,  just  described, 
occurs  when  a  page  is  not  core  resident.  The  other,  called  a  register  fault,  occurs 
'/n'is  :  he  page  is  but  cn  ■  ".hi  .■!vant  p' P';’  r''gl.'her  ir  neli,  vhxS  c-’se  ’  rsn- 
dlcd  in  a  highly  efficient  way:  the  user  map  table  is  checked  by  the  kernel  to  see 
which  capability  (and  therefore  which  page)  is  desired.  The  appropriate  value  is 
then  placed  in  the  register  and  user  execution  continues. 


The  preceding  outline  indicates  how  the  UCLA  system  provides  a  complete  virtual 


*  A  logical  disk  can  alternately  be  treated  as  a  device,  and  Start-I/Os  issued  to  it. 
However,  a  disk  treated  in  this  manner  cannot  also  hold  pages. 
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memory  and  file  system  with  only  a  simple  set  of  paging  primitives  in  the  kernel. 
Ihis  siniplieity  was  aehieved  by  tv;o  major  deeisions.  First,  the  virtual  memory  fa- 
eJlities  were  deeomposed  into  that  whieh  had  to  operate  eorrectly  in  order  to  main¬ 
tain  the  seeurity  and  integrity  of  the  system  (Swap,  Reflect,  and  Completion- 
interrupt)  and  the  rest  of  the  virtual  memory  mechanism  (page  replacement  algorithm, 
interaction  vdth  epu  scheduling,  etc.).  This  decision  had  a  significant  effect  on 
the  system's  resulting  simplicity.  Second,  file  activity  and  process  memory  swapping 
were  combined  into  one  mechanism.  In  standard  Unix,  main  memory  is  broken  into  two 
areas:  one  to  hold  user  process  images,  and  the  other  for  I/O  buffers.  Each  area  is 
managed  separately.  The  I/O  buffers  are  replaced  in  LRU  order,  while  scheduling  of 
process  images  is  handled  differently.  All  disk  I/O  buffers  are  the  same  size,  while 
process  images  vary.  The  code  used  to  handle  I/O  buffers  is  in  large  part  different 
from  that  used  to  handle  the  movement  of  process  images,  and  significant  parts  of 
both  collections  of  code  are  important  to  the  system's  security  and  integrity. 


In  UCLA  Unix,  only  one  mechanism,  paging,  exists,  and  much  of  its  support  has 
been  moved  out  into  a  scheduler  which  can  not  affect  the  integrity  of  the  system.  As 
explained  earlier  in  the  section  on  capabilities,  the  user  domain  also  carries  some 
of  the  responsibility  for  virtual  memory  management.  By  placing  some  of  the  respon- 
thr?  dO'H'Hin  for  vhich  tr.?  cjction  is  b?Ln.v  ts’<9n  »  proo-'r^st**  in  n 

further  limited.  Application  code  is  of  course  unaware  of  that  responsibility,  since 
the  o.s.  interface  is  performing  the  task. 


3.- 8.  Firmware  Implementation 

The  UCLA  kernel  has  been  developed  to  be  a  candidate  for  firmware  implementa¬ 
tion.  To  be  practical,  it  is  helpful  if  each  call  behaves  as  much  as  possible  as  a 
separate  instruction,  with  no  need  to  be  interrupted  in  execution,  nor  to  issue  I/O 
calls  for  which  the  results  affect  the  instruction's  behavior,  since  I/O  is  typically 
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slov;  relative  to  raicroprogram  cycle  speeds.  These  criteria  are  met  by  the  UCLA  ker¬ 
nel.  Therefore  it  differs  significantly  from  architectures  such  as  Multics  or  relat¬ 
ed  v;ork.[Millen  76][Organick  71]  In  both  of  those  systems  all  of  the  operating  sys¬ 
tem,  including  inner  rings  in  Multics  and  kernel  softv;are  in  the  case  of  Mitre,  must 
be  considered  as  part  of  the  user  process.  Any  process  can  be  suspended  in  the  mid¬ 
dle  of  execution  in  the  inner  ring  or  kernel  mode,  respectively.  Neither  of  those 
systems  lend  themselves  to  firmware  considerations,  the  Mitre  work  because  of  the  ar¬ 
chitecture,  and  Multics  because  of  its  size  and  architecture. 

3..S.  Verification  Impacts 

Verification  of  a  full  scale  operating  system  is  a  multistep  process,  and  the 
methods  employed  at  UCLA  are  outlined  by  Popek  [Popek  78],  with  more  detail  available 
from.  Kem.merer  [Kemmcrer  78].  The  effect  that  the  verification  and  certification  goals 
had  on  the  system  architecture  was  exceedingly  positive..  Often  a  design  choice 
presented  itself,  without  any  clear  basis  for  resolution  except  maximizing  verifica¬ 
tion  case.  In  retrospect  this  criterion  vias  quite  effective  in  making  decisions  and 
avoiding  design  p.itfalls.  Further,  v/hen  it  became  clear  subsequent  to  implementation 
of  certain  parts  of  the  system  that  verification  v;ould  be  difficult,  those  portions 

r  •-■■..-''''e  i.op'.rd  ,  g'-'oi  >1'  'f  tM.?  .i'-  oucl  i.c  '.acticri  o-i.'  . 

3.o._1_  Ser.uential  Code 

The  current  state  of  verification  tools  do  not  permit  proof  of  parallel  pro- 
gra.ms.  Since  semi-automated  aids  are  in  our  view  essential,  this  constraint  implied 
a  kernel  design  and  implementation  in  which  each  call  ran  from  start  to  completion 
without  interruption,  including  the  interrupt  handlers.  The  UCLA  kernel  is  built  in 
this  way,  and  so  most  of  it  can  be  proven  by  standard  verification  methods. 


The  cost  of  this  design  choice  results  from  delayed  servicing  of  interrupts 
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which  arrive  v;hile  a  kernel  eall  is  in  progress.  To  minimize  this  problem,  eaeh  call 
is  designed  to  run  very  quickly:  approximately  one  millisecond  or  less.  To  do  so,  no 
kernel  call  may  do  1/0  of  its  own  while  in  the  midst  of  execution,  since  virtually 
all  devices  respond  rather  slowly  relative  to  this  criterion.  While  millisecond  de¬ 
lays  in  interrupt  servicing  may  not  be  suitable  for  heavy  real  time  activity,  it  ap¬ 
pears  quite  acceptable  for  interactive  systems,  which  is  the  nature  of  Unix. 


l-l-Z  i/£l  Interface. 


The  PDP-11  does  not  have  any  significant  channels;  instead  the  device  registers 
are  wired  into  physical  address  locations  and  -channel"  functions  are  executed  by  epu 
code.  Since  all  devices  address  main  memory  (and  secondary  storage)  in  terms  of  ab¬ 
solute  addresses,  I/O  management  is  therefore  necessarily  a  kernel  responsibility. 
That  is  unfortunate,  for  several  reasons.  First,  device  semantics  are  quite  complex 
and  difficult  to  interface  with  the  semantics  of  the  programming  language  in  which 
kernel  code  is  written.  Next,  devices  are  probably  the  single  largest  source  of 
changes  to  the  kernel,  since  as  new  types  of  devices  are  added,  additional  verified 
kernel  code  is  required  to  manage  the  device's  actions.  To  minimize  the  impact  of 
these  problems,  kernel  1/0  code  was  redesigned  to  provide  a  device  independent  level 
of  i/G  within  the  t-rne.l.  Code  the:  1-vel  is  not  -ir'cernon  wL.n 

any  of  the  device  details.  Code  below  it  implements  device  dependent  issues,  includ¬ 
ing  any  device  dependent  protection  controls.  The  I/O  abstraction  level  appears 
similar  to  a  eluinnei  interlace,  with  well  defined  opcodes  and  operands. 

This  1/0  abstraction  level  is  quite  important,  likely  more  so  than  the  process 
abstractions  mentioned  by  other  authors,  since  at  least  half  of  the  operating  system 
kernel  is  concerned  with  I/O. [Schroeder  77][Millen  76]  As  a  result  of  its  use,  device 
semantics  have  been  isolated  to  the  low  level  drivers.  See  Walker  for  more 
information. [Walker  77] 
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A-  The  Poliev  Manar:er 

The  Poliey  Manager  is  the  major  seeurity  relevant  process  in  UCLA  Unix.  It  is 
responsible  for  implementing  a  shared  file  system,  for  maintaining  whatever  security 
policy  is  to  be  supported  by  the  system,  and  for  part  of  the  aetion  of  proeoss  ini¬ 
tialization,  which  oeeurs  every  time  a  Unix  fork  operation  takes  place.  Each  of 
these  issues  is  discussed  below.  Long  term  resource  allocation  can  also  be  imple¬ 
mented  in  this  process,  but  currently  is  not. 


A.JL  The  File  System  and  Protection  Poliev 

User  code  must  see  a  file  structure  which  is  identical  to  the  Unix  tree  of 
directories.  Hov/ever,  one  should  not  immediately  conclude  that  the  entire  directory 
structure  and  other  file  support  should  be  implemented  in  trusted  eodc.  In  fact,  one 
ean  make  the  following  argument,  largely  independent  of  the  seeurity  poliey  to  be  en¬ 
forced  . 


Most  eode  to  be  run  in  the  user  domain  strictly  should  not  be  trusted  to  be 
eorreet,  at  least  not  to  the  same  standards  as  the  verified  secure  kernel  and  policy 
manager.  However,  all  names,  including  file  names,  are  either  issued,  interpreted  or 
tranrmltted  through  that  ''ode.  Therefore  it  moker.  little  senue  to  "cri"'/ 
tory  naming  scheme  of  a  file  system  when  significant  amounts  of  unverified  eode  issue 
the  names  or  are  in  the  path  leading  to  the  file  system.  The  best  one  ean  do,  it  ap- 
P'-.;ar;;,  rs  to  provide  trie  user  with  a  roliabia  fneans  to  specify  a  i);"oco.is  profile 
v/hich  characterizes  the  categories  of  files  to  which  the  process  is  to  be  allowed  ac¬ 
cess.  Profile  specification  and  alterations,  together  with  the  association  of  labels 
v/ith  the  file  on  v/hieh  categories  are  based,  must  therefore  be  done  in  a  guaranteed 
reliable  way  if  the  verified  protection  and  integrity  of  the  entire  operating  system 
is  to  have  any  moaning.  That  necessary  secure  terminal  facility  is  discussed  in  sec¬ 


tion  7  below. 
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The  file  protection  labels  provided  in  UCLA  Unix  consist  of  a  very  large  variety 
of  "colors".  Each  file  can  be  labelled  vrith  some  number  of  them.  Each  user  (princi¬ 
pal  in  Seltzer's  terminology  [Saltzer  75])  has  a  fixed  color  list  associated  with 
him.  It  is  understood  that  a  user  potentially  can  access  a  file  only  if  his  color 
list  covers  that  of  the  file.  The  actual  profile  for  a  running  process  can  be  set  to 
any  subset  of  the  user's  color  list.  There  is  a  separate  profile  for  read  and  write. 

Since  there  are  a  large  number  of  colors,  many  of  the  usual  protection  policies 
can  be  implemented  using  them.  Public  files  are  labelled  with  the  color  public  and 
all  users  have  that  color  in  their  list.  Denning  has  noted  that  military  security 
policy  is  essentially  a  lattice,  and  that  the  relations  of  sets  and  subsets  provides 
just  the  lattice  required.  Individual  file  names  are  had  by  assigning  a  given  color 
to  a  single  file.  This  color  system  is  still  evolving  as  experience  is  gained  with 
the  user  protection  interface,  especially  in  the  area  of  control  over  changes  to 
color  lists.  Additional  detail  is  provided  by  Urban  [Urban  78]. 


Given  the  preceding  view  of  file  system  protection,  one  can  profitably  decompose 
its  implementation  into  two  parts,  one  a  common  mechanism  relevant  to  security  and 
integrity,  the  other  executable  in  the  domain  of  the  requesting  user  process.  The 


common 


mechanism  can  support  a  simple,  flat  file  system.  Ililos.  arv 


iiaant  data  type,  and  a  color  list  is  one  of  the  attributes  of  a  file.  The  simple 
file  system  mechanism  must  include  complete  space  management:  disk  free  lists  and 
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these  data  structures. 


Many  of  the  faciltics  normally  thought  of  as  part  of  the  file  system  can  be 
provided  by  software  in  the  individual  process  domains  as  part  of  the  o.s.  interface: 
directory  structure,  maintenance,  and  searching;  end  of  file  indicators  and  other 
file  status  information  such  as  usage  locks.  Directories  are  then  contained  in 
files,  and  access  to  directories  is  controlled  in  the  same  way  as  access  to  any  other 
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files.  Assuming  that  the  common  mechanism  in  the  policy  manager  is  verified  correct, 
users  can  affect  one  another  only  through  the  use  of  files  to  v;hich  they  share  ac¬ 
cess. 


.  2  Procc.ss  Initialization  and  Forkinc 


The  policy  manager  must  also  be  involved  when  new  processes  are  created,  since  a 
kernel  process  body  must  be  Initialized  and  appropriate  capabilities  need  to  be 
granted  to  the  new  process.  As  much  as  possible  however,  one  wishes  process 
bootstrapping  to  take  place  within  the  domain  of  the  new  process.  In  UCLA  Unix,  the 
normal  procedure  for  process  forking  is  as  follows.  The  requesting  process  sends  a 
message  to  the  Policy  Manager  requesting  the  new  process  as  a  member  of  the  same  user 
family.  The  Policy  Manager  records  the  user  to  be  associated  with  the  new  process 
and  issues  a  kernel  Initialize  call,  v;hich  zeroes  a  process  body,  grants  two  capa¬ 
bilities  to  that  process,  and  sets  the  program  counter  and  status  to  standard  values. 
The  capabilities  point  to  a  standard  boot  code  page  and  a  scratch  data  page  respec¬ 
tively.*  A  third  capability  is  granted  by  the  policy  manager  upon  process  request  to 
give  the  process  the  ability  to  communicate  v;ith  its  forking  parent.  From  here  on, 
initialization  takes  place  wholly  in  the  domain  of  the  new  proces.  The  process  be- 
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handled  normally.  Eventually  the  boot  code  v;ill  load  the  o.s.  interface  and  presum¬ 


ably  a  Unix  Shell  into  its  address  spaces. 


4.. 3.  Other  Policy  Manager  Responsibilities 

In  UCLA  Unix,  the  Policy  Manager  is  also  responsible  for  control  over  access  to 
the  other  kernel  supported  objects  besides  pages:  processes  and  devices.  Devices  ap- 


*  The  boot  code  is  actually  the  Kernel  Interfaee  Subsystem  discussed  in  seetion  5. 
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pear  as  special  files  and  inter-process  communication  takes  place  through  pages  which 
appear  as  part  of  a  file.  Therefore,  colors  are  uniformly  employed  for  access  con¬ 
trol  in  these  eases  too. 

An  ARPANET  connection  is  provided  in  UCLA  Unix;  access  to  it  must  be  controlled 
and  support  for  initial  network  connection  activities  is  required.  Access  control  is 
done  by  making  each  host  a  special  file  and  using  colors.  .See  section  8  below  for  a 
discussion  of  initial  connection  protocols. 


ji.  The  Kernel  Interface  Subsystem 


Since  the  kernel  is  an  operating  system  nucleus  of  minimum  size  and  complexity, 
one  ean  properly  expect  that  it  is  not  a  convenient  base  to  build  on.  Traditional 
systems  provide  a  good  deal  of  "extension"  for  eonvenicnee.  V/hile  at  first  glance 
the  o.s .interface  has  this  responsibility,  it  should  be  noted  that  a  considerable 
amount  of  code  is  written  to  run  directly  on  top  of  the  kernel:  the  o.s.  interface, 
the  network  manager,  process  initialization,  and  the  scheduler,  for  example.  Each  of 
these  need  basically  the  same  extensions:  capability  management,  inter-process  com¬ 
munication  support,  virtual  memory  code,  and  some  file  system  interfaces.  Therefore 
we  have  developed  an  intermediate  interface  between  the  o.s.  interface  and  the  ker- 
fiL'i.  fee  sofcwar;  whiea  irnpiemence  it  provLdje  a  much  .'lore  convenierit  inter tace  to 
the  kernel  and  is  called  the  Kernel  Interface  Subsystem  (KISS).  As  an  extension 
-'cr  •r isn  .  ’■he  Kl.'tl  the  ertire  of  tl'.''  p.” "5-'. .  Ir  ■"r'l-'r'’ I  ,  no 

other  code  in  the  process  makes  kernel  calls,  sends  messages  to  the  scheduler  or  pol¬ 
icy  manager,  etc.  Thus  this  software  package  has  primary  responsibility  for  main¬ 
taining  a  convenient  "virtual  machine"  for  the  user  process. 


The  KISS  of  course  runs  as  part  of  the  user  process  domain,  and  is  architectur¬ 
ally  contained  in  the  same  address  space  of  the  process  as  the  o.s.  interface.  The 
KISS  can  be  vievred  as  an  inner  ring  in  the  sense  of  Multies,  and  if  appropriate 
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hardware  were  available,  that  would  be  an  effective  means  of  implementation. 

6.  The  Unix  Interface 

The  operating  system  interface  has  the  responsibility  of  providing  a  user  pro¬ 
gram  interface  which  is  as  much  as  possible  identical  to  standard  Unix.*  It  handles 
user  system  calls  either  by  performing  them  itself  if  possible,  or  making  the  ap¬ 
propriate  kernel  calls  or  service  requests  to  the  policy  manager  to  get  the  desired 
action  accomplished.  Much  of  the  Unix  o.s.  interface  is  actually  lifted  from  the 
standard  Unix  operating  system.  Most  of  the  changes  consist  of  Vfholesale  deletions 
of  functions,  resulting  from  the  fact  that  many  of  those  functions  are  redundant 
given  the  available  kernel  facilities  and  the  fact  that  the  o.s.  interface  is  essen¬ 
tially  a  single  user  system.  All  scheduling  support  could  be  removed,  since  schedul¬ 
ing  is  done  in  a.  separate  process.  A  more  drastic  change  concerns  I/O  buffering.  In 
standard  Unix,  buffers  contain  significant  structure  to  aid  in  multiuser  and  LRU 
operation.  In  UCL.i  Unix,  most  of  that  function  disappears  since  it  is  done  by  the 
paging  mechanism  supported  by  the  kernel  and  scheduler.  I/O  support  is  replaced  in 
the  o.s. interface  by  code  that  requests  file  opens  and  relevant  page  capabilities 
from  the  Policy  Manager,  and  issues  Map  calls  to  add  those  pages  to  the  interface's 
vu-rij.U  m-morv.  th?  -'rely  tries  to  d^t-i  on  th-'  :o 

move  it  to  the  user,  and  the  usual  page  faulting  and  sv/apping  action  takes  place. 

‘■:?w  in  the  int:'r'rac;:  .nsi'-ts  o  ■ '  th-  /.ihh,  t'l  t-h^ 

interface/KISS  boundary,  ipc  support,  and  maintenance  of  the  process  hierarchy.  This 
last  issue  is  discussed  below. 


*  There  are  certain  actions  possible  in  standard  Unix  which  will  be  blocked  by  the 
security  policy  of  the  secure  system. 
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L-1  The  F.11e  System 

The  Unix  interface  has  a  significant  portion  of  the  responsibility  for  making 
the  user  view  of  the  file  system  equivalent  to  standard  Unix.  This  task  consists  of 
all  directory  support,  including  searching,  working  directory  control  and  the  like. 
Once  the  desired  logical  file  name  is  found  in  a  directory,  a  file  open  request  of 
the  policy  manager  can  be  made  using  that  name.*  Directory  searches  are  done  by  first 
opening  the  containing  file,  like  any  other.  It  is  the  responsibility  of  the  Unix 
interface  to  manage  its  open  files  in  such  a  way  as  to  keep  the  working  directory 
open  most  of  the  time  to  minimize  search  costs. 


6.2  Forking  and  Process  Hierarchies 


In  standard  Unix,  a  given  user  can  have  a  process  family  active  for  him.  The 
family  is  hierarchical  in  the  sense  that  parents  have  certain  rights  over  children. 
Hov/ever,  intra-family  protection  is  not  really  effective,  since  any  member  of  a  fami¬ 
ly  can  convince  any  other  member  to  destroy  itself,  and  to  take  other  undesirable  ac¬ 
tions,  via  standard  Unix  functions. 


Therefore  process  hierarchies  should  not  be  supported  by  kernel  code,  and  so  in 
•JCL-r  t'P.i.x,  :n=''iber3  of  a  Dr'ooe;-;-;  famlJ.y  c:oon‘''r'"’t'H  ?j.:!Ony  t'r.''!.nS‘.='lvco  to  '.'rf'-'jt  r^'-aaly 
behavior.  Of  course,  the  support  for  process  families  is  provided  in  the  o.s.  inter¬ 
face,  so  that  user  softv;are  need  not  be  concerned.  This  design  choice  simplified  the 
kernel,  and  in  iigh’;  of  the  observations  made  above,  liad  little  or  no  effect  on  the 
actual  protection  functionality  provided. 


In  the  implementation,  each  process  of  a  family  has  a  capability  for  a  shared 
page,  set  up  by  family  members.  In  that  page,  data  structures  are  maintained  by  the 


*  The  logical  file  name  is  essentially  an  inode  number. 
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o.s.  interface  so  that  intra- family  relationships  are  properly  supported.  In  doing 
so,  the  kernel  notification  facility  is  used  to  great  advantage.  Unix  typically  per¬ 
forms  a  great  deal  of  "one  to  n"  notification:  one  process  issuing  a  signal  intended 
for  the  rest  of  the  family.  The  kernel  Notify  call  is  designed  to  support  this 

behavior  efficiently,  as  well  as  to  be  adaptable  for  other  uses. 

2..  Secure  User  Interface 

In  order  for  any  user  to  have  assurance  that  the  protection  controls  of  a  system 
are  operating  in  the  manner  desired,  it  is  crucial  that  he  be  sure  of  the  values  to 

which  protection  policy  data  has  been  set.  Further,  when  login  takes  place,  there  is 

an  issue  of  mutual  authentication:  the  user  wishes  to  be  sure  that  he  is  interacting 
with  the  secure  system  interface,  not  some  clever  user  simulation  of  it  vihich  col¬ 
lects  passwords.  For  both  of  these  reasons,  UCLA  Unix  contains  a  small  dialoguer 
process  to  which  the  user  terminal  can  be  reliably  connected.  The  user  causes  his 
terminal  to  be  switched  to  the  dialoguer  by  typing  a  predefined  sequence  of  break 
characters.*  The  kernel  supports  the  terminal  switch  through  maintenance  of  terminal 
modes .  A  terminal  can  be  thavfcd  or  froxen .  Capabilities  are  granted  by  the  Policy 
Manager  giving  access  to  terminals  only  when  thawed,  or  only  when  frozen.  When  the 
.-rt _;ijcup_(-;0  ■; -j  r’ .■•■.-■'C t-vcJ  ,  or  v.'hvei  '!  i'-ue  dro’o  ,  th.e  li"!'-'  ii-  m'!."' i'rox-'''; . 

The  Policy  Manager  grants  frozen  access  only  to  the  dialoguer,  thavjed  access  in  all 
other  cases.  In  this  way,  the  user  can  move  his  terminal  to  the  dialoguer,  accom- 
piim  whrataver  cnaage  is  desir'ed,  such  as  changing  process  profiles,  find  then  move 
the  terminal  back,  all  without  disturbing  the  state  of  computation  of  the  process  at 
all  so  that  it  can  be  continued. 


*  Kernel  recognition  of  the  break  sequence  is  not  expensive  since  POP-ll  hardware  re¬ 
quires  character  by  character  terminal  input  handling  anyway . 
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The  Scheduler 

Whenever  it  is  time  for  a  process  invocation  decision  to  be  made,  the  Scheduler 
is  invoked,  cither  directly  by  a  user  process  (i.e.  when  it  wishes  to  sleep)  or  by  a 
clock  interrupt.  The  kernel  posts  a  considerable  amount  of  data  to  the  scheduler 
process,  so  that  it  can  make  sophisticated  resource  allocation  decisions,  about  both 
memory  and  the  cpu.  Centralizing  both  classes  of  resource  control  permits  effective 
coordination  of  allocation  decisions  and  therefore  potentially  higher  performance.  A 
large  class  of  scheduling  policies  can  be  implemented  in  this  process.  Some  of  them 
have  confinement  implications  but  provide  better  performance  potential  than  those 
which  do  not.  This  architecture  permits  the  system  operator  to  make  the 
confinement/performance  tradeoff,  since  there  is  no  kernel  effect  from  scheduling 
policy  changes. 

The  one  potential  drawback  of  a  separate  scheduler  process  is  that  it  doubles 
the  actual  number  of  process  invocations  over  what  is  really  needed.  This  overhead 
is  of  little  consequence  if  context  switches  are  relatively  cheap,  and  this  will  be 
the  case  for  UCLA  Unix.* 

S..  Secure  Computer  Networks 

When  security  is  of  concern  in  a  computer  network,  encryption  of  the  lines  is 
generally  a  necessity,  because  those  lines  are  not  considered  safe  from  tapping  or 
spoofing.  However,  the  usual  appr'oaoh  is  to  encrypt  and  decrypt  t!;e  oata  o.cternal  to 
the  central  machine  and  its  operating  system. 


*  Context  switches  on  the  PDP-11  are  in  general  fairly  slow.  Therefore,  the 
scheduler  is  actually  to  be  run,  still  as  a  separate  process,  in  kernel  mode  of  the 
hardware.  This  avoids  the  necessity  of  extensive  state  saving  and  restoring,  but  re¬ 
quires  the  scheduler  to  be  written  in  a  language  for  which  it  can  be  demonstrated 
that  kernel  data  structures  arc  not  touched.  The  implemented  scheduler  is  v/ritten  in 
UCLA  Pascal.  Moving  it  into  kernel  mode  was  not  yet  complete  when  this  paper  was  au¬ 
thored  . 
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It  should  be  recognized  that  the  software  resident  within  the  operating  system 
responsible  for  managing  the  netv/orh  is  both  complex  and  relevant  to  security  and  in¬ 
tegrity.  In  standard  Unix  with  an  ARPANET  Network  Control  Program  (MCP),  the  NCP, 
operating  as  a  common  mechanism,  is  of  comparable  size  and  complexity  to  the  whole 
operating  system.*  Typically,  one  wishes  to  protect  each  network  connection  separate¬ 
ly  from  each  other  connection,  but  the  NCP  manages  them  all,  including  moving  data 
from  user  buffers  through  the  NCP  and  out  to  the  network  interface  device. 

Given  the  availability  of  a  secure  operating  system,  one  can  entertain  the  idea 
of  extending  the  "ends"  of  the  encryption  path  deep  into  the  operating  system.  For 
example,  the  user  process,  as  it  hands  data  over  to  the  NCP,  could  be  forced  to  cause 
the  data  to  bo  encrypted,  so  the  network  softv.’are  is  treated  merely  as  part  of  the 
insecure  transmission  channel.  That  data  vjould  not  be  decrypted  until  the  receiving 
NCP  handed  it  over  to  the  destinatio.n  user.  If  each  connection  were  encrypted  with  a 
separate  key,  then  NCP  errors  and  misdelivery  within  the  host  operating  system  would 
not  affect  security.  If  suitable  error  correction  is  incorporated  with  the  encryp¬ 
tion,  then  integrity  problems  can  also  be  detected. 


The  main  problem  in  this  approach  is  the  initial  connection  establishment  proto¬ 
col:  how  to  permit  users  tp  supply  the  NCP  v'ith  paramsbers  telling  v/hich  site  and 
what  type  or'  connection  nhouio  oa  eotablishsd ,  witnouc  iarf^e  coni'lnement  enanneis  in 
the  system.  For  a  discussion  of  these  and  related  issues,  sec  Kline  [Kline  78J. 
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tional  kernel  code  to  support  secure  nctv.'ork  operation  was  quite  small.  Further, 
most  of  the  original  NCP  v;as  kept  unmodified,  although  its  lower  level  was  altered  to 
match  the  kernel  interface.* 


*  The  NCP  being  considered  was  developed  at  the  Univ.  of  Illinois. 

*  The  Illinois  NCP  "kernel"  was  rev;ritten. 


The  programming  language  employed  in  software  development  is  usually  recognized 
to  have  a  significant  effect  on  that  effort;  however  vihen  the  goal  of  development  in¬ 
cludes  verification,  the  effect  is  heightened.  The  specific  language  issues  break 
down  here  into  two  groups;  those  concerned  with  systems  programming,  and  those  con¬ 
cerned  with  the  scale  of  the  verification  steps. 

Systems  programming  issues  arise  in  the  same  way  that  they  occur  in  most  high 
level  systems  programming  languages.  It  is  necessary  to  be  able  to  express  details 
of  the  hardware  in  the  high  level  language,  such  as  interrupt  vectors,  hardware  dev¬ 
ice  registers,  or  special  instructions.  These  facilities  must  be  available  in  the 
programming  language,  but  in  a  way  that  minimizes  the  effect  on  the  semantics  of  the 

rest  of  the  language. 


Virtually  all  the  security  and  integrity  relevant  code  in  UCLA  Unix  is  vrritten 
in  a  slightly  altered  Pascal.  Obvious  verification  problems  were  removed  from  the 
language,  such  as  pointers,  variant  records,  and  various  sources  of  aliasing. [Lampson 
77]  I/O  facilities  were  also  deleted,  since  we  were  building,  I/O  mechanisms,  among 
other  functions.  The  runtime  package  needed  to  support  Pascal  I/O  would  have  been 
baggage,  end  it  typically  would  ba  writ^ren  in  nod  ^  vnuld 

be  little  chance  of  ever  verifying  properties  of  its  operation. 


gramming,  as  remarked  above.  Very  few  additions  were  actually  necessary,  and  were 
limited  to  the  following: 

a)  the  ability  to  declare  a  variable  to  be  stored  at  a  fixed  physical  location  (to 
initialize  interrupt  vectors,  access  device  control  registers,  etc.), 

b)  assembly  language  procedures  (so  that  special  hardware  instructions  could  be  ex¬ 


pressed  as  a  procedure  call). 
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c)  the  ability  to  have  procedures  which  take  array  parameters  whose  length  is  deter¬ 
mined  at  call  time  (to  remedy  the  most  significant  limitation  of  Pascal). 

We  also  developed  an  extensive  library  system  to  support  independent  compilation  of 
program  modules,  and  yet  force  type  integrity  across  module  boundaries.  The  compiler 
and  library  system  force  recompilation  of  modules  when  needed  for  compatibility  with 
another  module  which  has  been  altered.  This  facility  is  needed  since  the  verifica¬ 
tion  work  depends  on  type  enforcement.  The  language,  compiler,  and  library  system 
arc  discussed  by  Vial  ton.  [Walton  76] 

There  are  many  issues  concerned  with  the  scale  of  the  verification  effort.  It 
is  believed  that  over  half  of  the  original  verification  effort  could  be  avoided  if 
the  language  contained  more  reasonable  controls  over  aspects  of  program  behavior. 
One  of  the  more  obvious  examples  concerns  the  integrity  of  global  variables.  An  im¬ 
portant  portion  of  the  assertions  to  be  verified  state  that  most  of  the  kernel  vari¬ 
ables  have  not  been  altered  by  the  routine  being  considered.  (After  all,  much  of  the 
statement  of  security  concerns  what  is  not  to  happen.)  These  assertions,  in  the  form 
of  a  large  invariant,  could  be  simply  handled  by  scope  controls  in  the  language,  such 
as  the  Import/Export  lists  of  Euclid  [Lampson  77].  Then  compile  time  enforcement 
could  be  employed  and  the  verification  task  correspondngly  simplified.  UCLA  Pascal 
h3s  Modified  ':.o  ■.mport  Lists. 

Another  example  vihera  the  verification  task  can  be  eased  concerns  array  bounds 
e  L r.g  .  lS  s b  s  i  p'r.s  c:'.s  ‘’ssily  ue  cu"  of  r. -'n  ,  m.s  t:;  '■or  ’  po¬ 
tentially  reference  data  other  than  the  given  array,  violating  type  rules.  There  are 
four  reasonable  ways  to  deal  with  this  problem:  Subscript  checking  could  be  done  by 
hardware,  by  runtime  software  generated  by  the  compiler,  by  runtime  software  expli¬ 
citly  inserted  by  the  programmer,  or  it  could  be  verified  in  many  cases  that  sub¬ 
scripts  do  not  get  out  of  range.  The  PDP-11  hardware  base  does  not  provide  any  rea¬ 
sonable  v;ay  to  itself  check  subscript  references.*  The  UCLA  Pascal  compiler  does  not 
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implement  array  checking  code.  Therefore  a  combination  of  the  remaining  choices  were 
taken.  The  resulting  assertions  which  need  to  be  proven  compose  a  significant  frac¬ 
tion  of  the  total  verification  to  be  done.  Clearly  here  is  a  fertile  area  for 
language  support  or  enhanced  verification  tools. 


11.  Architectural  Observations 

UCLA  Unix  comprises  the  first  verifiably  secure,  full  functionality  operating 
system  with  a  fine  grain  of  protection.  The  experience  gained  in  its  design  and 
development  lead  us  to  several  conclusions.  Most  obvious,  secure  operating  systems 
are  feasible  to  develop,  although  the  development  cost  is  likely  to  be  considerably 
greater  than  if  higly  reliable  security  and  integrity  were  not  such  a  serious  goal. 
However,  the  result  is  a  system  which  appears  to  exhibit  considerably  enhanced  relia¬ 
bility  and  integrity,  and  because  of  the  strict  modularity,  is  easier  to  modify. 
Performance  does  not  appear  to  be  adversely  affected  by  the  architectural  constraints 
imposed  by  the  various  goals.  That  is,  the  net  result  of  the  security  goal  seems  to 

be  a  better  system  in  general. 


It  should  be  noted  however  that  one  of  the  central  ideas  to  the  success  of  the 
work,  kernel  structured  architectures,  requires  considerable  rethinking  of  the  usual 
ooerati'v;  system  architecture  views  if  it  is  to  b?  effectively  employed.  Muon  ci  tne 
standard  operating  system  wisdoms  must  be  reexamined,  or  the  result  will  be  a  "ker- 


H  i -•  i ovor'y  c-r'i'clc'';  and  iiOt  .-■••i'’ n  n  Le  f  o: 

correct  security  and  integrity  enforcement. 


^•igorru"  dR''’onstr ati-'in  o'. 


S.t^ 


In  conclusion,  it  appears  that  the  goal  of  obtaining  secure  operating  systems, 
least  for  centralized,  medium  scale  machines,  has  been  largely  reduced  to  (high 


*  The  new,  upward  compatible  DEC  VAX/7B0  does. 
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quality)  engineering 


g,  v,-ith  the  most  significant  progress  required  in  program  verifi 


*  cation. 
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